Munging Global COVID stats

Posted by Patrick Lam on Friday, September 4, 2020

Previously (NZ stats); banner image from US CDC:

It hasn’t been discussed in either the Canadian or NZ news from what I can see, but I’m vaguely aware that some countries are having a legit resurgence. I understand that people who are more closely connected to European countries have been talking about it more. Quebec is worrying about one but the Canada-wide numbers are still better than many European countries. Of course we should worry about what is going to happen when schools are back.

A number of countries had controlled the pandemic for a while but had new clusters come up, most notably Australia and particularly Melbourne in Victoria state, which is still under a lockdown as I write this. Auckland also had a couple of weeks of lockdown in New Zealand. But I had the feeling that the numbers in New Zealand are way smaller than elsewhere, and wanted to check that.

I started looking at data manually and then went programmatic, writing some Python code to munge the data, helpfully provided in JSON format.

  • Q: How does the number of new cases/day compare to peak?

I do not believe in per-capita numbers for evaluating outbreak size; even in the United States, I think the uninfected population is essentially infinite. So I claim that what matters is the absolute size of the outbreak. And it’s not going to generally be evenly-distributed across the whole country, but rather in a couple of hotspots. We saw, for instance, how New York was a hotspot for the US, and how Quebec was a hotspot for Canada. (I will agree that per-capita numbers in e.g. a city say something about how likely it is to encounter a case in that city. [Edit: A valid implication is that per-capita numbers in a city can serve as an approximation of the risk level involved in being out and around in that city; but I’ll reiterate that I don’t think that one should say that X is doing better or worse than Y because its per-capita number is lower or higher.])

So here are the results of my data munging.

4-Sept cum cases new cases/day (7 day avg) all-time peak day peak date ratio now/peak
Essentially-zero club
Mongolia 306 0.7 74 16-May 0.01
Taiwan 490 0.4 31 19-Mar 0.01
Vietnam 1046 1.4 50 31-Jul 0.03
New Zealand 1413 7.1 95 31-Mar 0.08
Bhutan 227 6.1 29 31-Aug 0.21
China 89986 24.6 15141 13-Feb 0.00
Singapore 56908 48.0 1426 21-Apr 0.03
Hundreds, good compared to peak
Australia 26049 103.9 721 31-Jul 0.14
Ireland 29206 107.6 1169 10-Apr 0.09
Sweden 84729 122.1 1698 25-Jun 0.07
South Korea 20842 252.1 909 29-Feb 0.28
Switzerland 43127 318.1 1390 28-Mar 0.23
Canada 130493 520.7 2760 04-May 0.19
Japan 70268 670.7 2064 15-Aug 0.32
Hundreds, but near historical peak
Greece 10998 209.6 293 27-Aug 0.72
Czech Republic 26452 450.3 679 4-Sep 0.66
Thousands, good compared to peak
Germany 246948 1063.0 6294 28-Mar 0.17
Italy 272912 1280.4 6557 22-Mar 0.20
United Kingdom 340411 1434.7 5487 24-Apr 0.26
Thousands, close to historical peak
Israel 125260 2122.4 3274 03-Sep 0.65
France 300181 5783.3 7578 01-Apr 0.76
Number 1
United States 6150655 40410 78427 25-Jul 0.52

These are the countries that are somewhat on my radar; there are, of course, many other countries. I don’t know much about African and many Asian countries.

Highly-controlled: essentially-zero and double-digits.

I’ve put five countries in the essentially-zero club: Mongolia, Taiwan, Vietnam, New Zealand, and Bhutan. NZ may be one of the few countries that distinguishes community transmission from imported cases; I don’t think there are good worldwide stats about that. Gotta read local news for details. NZ’s community-transmission number was 0 for 103 days. Vietnam almost as good.

Almost as good are some countries with double-digit numbers of new cases. One could have questions about early numbers from China but I guess they’re good now. Looks like Singapore has controlled the dorm outbreak.


The next group is countries with new-cases in the triple digits. There are two groups here: those that are well below their peak (Australia, Ireland, Sweden, South Korea, Switzerland, Canada, and Japan), and those that are uncomfortably close to their peak, or peaking (Greece, Czech Republic). The Czech Republic has a page where they estimate Rt and it has been close to 1 all summer, but not quite under 1. So the number of new cases keeps on slowly increasing, from dozens (April-June) to hundreds (July onwards).


I’d read about Israel really not doing well, and I’d looked at France’s numbers. In the thousands category, Germany, Italy and the UK are well below their historical peaks, while Israel keeps on increasing (I’ve read multiple pieces about how it was controlled and then got away), and France is at mid-thousands and near its peak.


(I had also made an observation that some countries had a trough—NZ’s was close to 0 for over 100 days—and then resurged, but I had no way of evaluating that. I think the technical analysis of stocks people would have some tools (although I don’t believe in technical analysis itself). I did evaluate this by eye and tried some calculations but didn’t get anything satisfactory.)


 3import json
 4import datetime
 5from dateutil.parser import parse
 7def lookup(C, data, csv):
 8    for country in data:
 9        if (data[country]["location"] == C):
10            biggest_date = parse('2000-01-01')
11            biggest_new_cases = 0
12            biggest_new_cases_date = 0
13            for days in data[country]["data"]:
14                thisdate = parse(days['date'])
15                if 'new_cases' in days:
16                    if days['new_cases'] > biggest_new_cases:
17                        biggest_new_cases_date = thisdate
18                        biggest_new_cases = days['new_cases']
19                if thisdate > biggest_date:
20                    biggest_date = thisdate
21                    if 'total_cases' in days:
22                        total_cases = days['total_cases']
23                    if 'new_cases_smoothed' in days:
24                        new_cases_smoothed = days['new_cases_smoothed']
25            ratio_today_over_peak = new_cases_smoothed / biggest_new_cases
26            if csv:
27                print ("\"{0}\",{1:%d}-{1:%b},{2:.0f},{3:.1f},{4:.0f},{5:%d}-{5:%b},{6:.2f}".format(C, biggest_date, total_cases,
28                                                                                                    new_cases_smoothed, biggest_new_cases,
29                                                                                                    biggest_new_cases_date,
30                                                                                                    ratio_today_over_peak))
31            else:
32                print ("{0}".format(C))
33                print ("date {0:%d}-{0:%b}, total cases {1:.0f}, new cases smoothed {2}".format(biggest_date, total_cases, new_cases_smoothed))
34                print ("biggest new cases {0:.0f} on date {1:%d}-{1:%b}, today/biggest is {2:.2f}".format(biggest_new_cases, biggest_new_cases_date, ratio_today_over_peak))
36countries = ["Mongolia", "New Zealand", "Vietnam", "Bhutan", "Taiwan", "Singapore",
37	     "China", "Australia",  "Ireland", "Sweden", "South Korea", "Switzerland",
38	     "Canada", "Japan", "Greece", "Czech Republic", "Germany", "Italy",
39	     "United Kingdom", "Israel", "France", "United States"]
42with open ('owid-covid-data.json') as jsonfile:
43    data = json.load(jsonfile)
44    for C in countries:
45        lookup(C, data, True)