The CRUTEM data

This is a presentation of some curious results, stemming from a simple Python programming exercise for my introductory programming class, METR 1313. The exercise involved investigating data from the famous CRUTEM dataset, also known as HadCRUT. Here are two commentaries about the dataset:

The contents from were downloaded from the UK met office and placed on the server for METR 1313 students. The CRUTEM data set contains monthly averages of surface air temperature measurements at 5696 sites over land. This data can be combined with sea surface temperature data to produce a global, gridded dataset of surface air temperature, such as at HadCRUT4. For the METR 1313 exercises, we did not grid the data. We maintained the data in its site form.

We do not have access to the site data that was used to construct the monthly averages within CRUTEM. We also do not have information on the calibration of the thermometers, or changes in the sites over the years, such as changes in vegetation or urbanization.

The data from each site is a contained in a file, with the file name being the site's numerical ID number. The data from these files were converted to one file that is very easy to use with Python, crutem41.pkl. The single pickle file loads easily, and has all the data and site information accessible by keys and indices of the familiar Python data structures.

To keep the exercise simple for the students, a subset of the sites is retained for sites that have no missing data for sixty years, from 1952 thru 2011, for the months of both January and July. Being a freshman course, we did not attempt a statistical analysis of the trends in the surface temperature. Also, we focused on temperatures of a month, rather than a season, so that no seasonal composites were constructed. We investigated global warming by simply looking at the difference of the average of the last 30 years in the data, minus the average of first thirty years.

July temperature increase

After 3 semesters of using CRUTEM 4.1.1, the more recent version CRUTEM 4.2.0 was installed. For both data sets, the time period for the analysis is the same 60 years, as described above. A few more sites in CRUTEM 4.2.0 now offer continuous data for July. Curiously, the data for the sites in CRUTEM 4.1.1 are changed, some of the them substantially. Some sites that show cooling in CRUTEM 4.1.1 now show warming, for example.

sites with 60 years of no missing data for July



a "blink" comparison of the above two images


Restriction to the midwest of the USA

The images show many sites with cooling in the midwest of the USA, particularly in CRUTEM 4.1.1. A restriction is placed on the latitude and longitude, and we find 184 sites in CRUTEM 4.1.1 and 198 sites in CRUTEM 4.2.0.



a "blink" comparison of the above two images


Composite of the midwest sites



a "blink" comparison of the above two images


Differences of the individual time series

We analyze the differences in the time series for the 184 sites common to both subsets of CRUTEM 4.1.1 and 4.2.0.

It is very curious that temperature data already issued as true for a site, is then modified, and often modified by such a large amount. Where does this site data come from? What is the algorithm for its modification? Presumably answers to the those questions are available somewhere.


Note the the average 0.32 F increase in the common sites between the two CRUTEM data sets. This number is close to the 0.34 F increase shown in the images of the previous section.

the new midwest sites in 4.2.0

The remainder of the increase is from 13 new sites in CRUTEM 4.2.0. (Note: the earlier captions imply that there are only 12 new sites. I do not know why 13 are found here).



Future work