Python for Probability of Precipitation Verification

This page shows my successful attempt to reproduce Verifying probability of precipitation - an example from Finland

The data set is available from that page. Here is the direct link: POP_3cat_2003.txt

I reproduced the 3 diagrams for the 24 hour PoP forecast verification. It should be a simple extension to do a verification of the 48 hour forecast, as well at the PoPhi forecasts, but I have not done that. The above link shows some results for those 3 other forecast verifications.

Reliability diagram

I prefer making the dot size proportional to the number of forecasts. Skillful forecast are indicated with a red dot. Lack of skill, meaning that forecasting with climatology gives more accuracy than the forecast product, is indicated with a blue dot.

The reason for the name is a bit obscure in the application to PoP forecasts. Let's call it the ROC curve. In the plot below, the threshold probability for the various points is labeled. I think such labels make the ROC curve far more useful and understandable.

Here is what ROC is all about. You make a binary decision based on the expectation of rainfall. Suppose for your activity, you assume the threshold PoP is 0.65. That means for PoP>.65 you take action with the presumption that rain will occur. For PoP<.65 you presume no rain will occur. The curve shows the benefits of using the forecast. A point above the diagonal shows skill. Forecasting with climatology to decide action would yield results along the diagonal. Note the threshold with label of -.001, which is exceeded on all days. The threshold of 1.001 is never exceeded.

Note the definitions:

• Hit rate: true positives/(true positives + missed positives)
• False Alarm Rate: false alarms/(true negatives + false alarms)

Relative Value Curves

This shows the economic value of your binary decisions, based on the thresholds that were also used for ROC. For example, if your cost is C=\$6 to bring an umbrella when you expect rain, and your loss is L=\$10 when it rains on you without your umbrella, your C/L ratio is 0.6. At 0.6, you can see that your maximum Relative Value is about 0.2. The red envelope curve is coincident with the curve labelled 0.750. So the maximum economic benefit occurs if you use a PoP of 75% for your threshold. This means you should bring your umbrella only when the PoP is forecasted to be 80%, 90% or 100%. The Relative Value of 20% means the economic benefit of using these imperfect forecasts is 20% of the value of perfect forecasts.

other data sets

Dear website reader, please share with me links to alternative data sets to which verify.py can be applied.

Here is one that I know about: