Perhaps what we need is a statistician, to explain about noise and anomalies.
@srw?
Not a statistician, and not srw, but if I may?
Let's start with the figure
@mjr posted:
Continuing this from the duplicate discussion...
I'm such a sucker for this. Here's 5 years I can see easily:
2015 - 6 women, 3 men; (upthread)
2014 - 1 woman, 13 men; (
met -
city)
2013 - 5 women, 8 men;
2012 - 1 woman, 13 men;
2011 - 6 women, 10 men;
total - 19 women, 47 men... seems closer to 5/8ths than 2/3rds, but still a majority. So what are the other five years, are the pre-2011 (so pre-cycle-hire-and-superhighways) figures relevant to today and have male and female cycling rates developed at different rates?
The first thing to note is that the numbers are actually very small (this is not to detract in any way that each number represents an individual tragedy) compared to the many billions of journeys by bike that have taken place in the same period. I'll leave a space for
@srw to comment about denominator neglect here...
Each fatality is a dscrete and presumably independent event, so use of the Poisson distribution is appropriate. The important thing is that the standard deviation is thus given by taking the square root of the total. The standard deviation is a statistical measure of the likely variability of a measured quantity. For instance, if you take many measurements if some quantity x and its mean is 10 with a std dev of 2, the true value of x is probably somewhere between 8 and 12 [1].
So, taking the totals:
Women 19
Std dev 4.36
Men 47
Std dev 6.86
But we haven't taken account of the fact that there are more male cyclists than female. IIRC the TfL figure was 25% of cyclists are female, so a quick and dirty way is simply to scale (or normalise, if you're a physicist) all numbers to 100%. For women, that means multiplying by 4 (4 x 25 = 100), for men, multiply by 1.333 (75 x 1.333 = 100) [2]. Thus our normalised figures are:
Women 76
Std dev 17.44
Men 63
Std dev 9.14
Thus the natural variations in fatalities over the 5 years for both men and women, normalised to compensate for differences in each population are in the range:
Men: 54 to 72
Women 59 to 93
Both ranges coincide. If there were a thousand identical Londons, the fatality rate for each of them would probably lie within this range just through chance (but see [1] below). There is no reason to believe therefore that women are over-represented in cycling fatalities in London with the figures as presented. Natural -
random - noise is large and total fatalities are - thankfully! - low: this means that if any systematic difference exists, it's buried in the noise. [3]
Note that I've used a rather crude method here. A more rigorous method would to apply the appropriate t-test to the data to test for the probability of the two populations being different. With overlapping variances, it is highly unlikely to give a significant result. I'll leave that one to the proper statisticians.
Oh, and just to reiterate - cycling, even in central London, is a very safe activity.
[1] Pedantic statistician's note: assuming a normal distribution, there is a 68% probability of x being somewhere between 8 and 12. Therefore, there is a 32% chance of x lying outside this range.
[2] Yes, I could have saved myself some effort and just multplied the female data by three. But then, I
am a physicist...
[3] Alert (or possibly awake) readers will have noticed that I've used standard deviation in a rather different way to how I defined it above. This is because there is no uncertaincy in the number of fatalties. But there is a large and random year to year variation. Standard deviation in this case refers to the amount of randomness in these figures that can be expected.
Edited to correct silly mistake in the normalised female fatality number - which
decreases the difference between men and women.