Comments on Amar Sagoo: Making sense of standard deviation

Thanks for the explanation. Not fully there, but s...

2023-06-30T15:41:56.542+02:00

Thanks for the explanation. Not fully there, but somewhat understood : )

article was written in 2007 , and its 2023 now and...

2023-06-30T14:39:41.606+02:00

article was written in 2007 , and its 2023 now and still this is the best explanation I found on the web. thanks a lot Amar

"Model dispersion for normally distributed ph...

2023-06-30T14:38:27.227+02:00

"Model dispersion for normally distributed phenomena" this is exactly the type of jargon amar tried to avoid.

From stats exchange: ‘the standard deviation is a ...

2022-10-25T08:59:18.452+02:00

From stats exchange: ‘the standard deviation is a term that arises out of independent random variables being summed together. So, I disagree with some of the answers given here - standard deviation isn't just an alternative to mean deviation which "happens to be more convenient for later calculations". Standard deviation is the right way to model dispersion for normally distributed phenomena.’

The higher powers can be used to define moments of...

2022-05-07T02:04:49.780+02:00

The higher powers can be used to define moments of distribution. Skew, Kurtosis etc.

Thank you for saving my sanity. This is an amazing...

2020-11-29T17:09:45.282+01:00

Thank you for saving my sanity. This is an amazing explanation.

Thanks Amar!

2019-04-27T06:20:39.664+02:00

Thanks Amar!

For days, I have been trying to figure out exactly...

2018-08-15T16:44:49.435+02:00

For days, I have been trying to figure out exactly why the difference was to be squared, and your example nails it! As to squaring or cubing or higher levels, I believe that it would amplify the results even more. Squaring should be sufficient for intuition for statisticians, I think. I will however try to extend your example using cubes and see how it works out. But thank you so much for this really beautifully explained article!

Two comments: Regarding the use of higher power...

2016-01-31T19:26:23.320+01:00

Two comments:

Regarding the use of higher powers to "amplify the tails," so to speak: this is known but is not commonly used. By using the third power, you will get a measure of how skewed the data is. Roughly speaking this would be a measure of the spread between mean and median.

Using the fourth power is a measure called kurtosis. This measure is roughly intended to give some idea of how heavy the tails are (or how many data points are a distance away from the mean).

Regarding the intuition of the bias of the estimated standard deviation the simple answer is that the average used to calculate the standard deviation has an error in it. When you account for the effect of this error on the estimated standard deviation, you get the N-1 term.

More technically, what is happening is this: if the assumption is that all the data points are drawn (with replacement to make is simple) from a distribution with a mean and a variance then the assumption is that the drawn value of each data point can be of any of the potential values of the sample space of distribution. This is absolutely true for for the first N-1 draws for a sample of size N. However, this is not true for the Nth draw because the Nth draw is constrained to a sample space of the single value that sets the final average that was used as the estimated mean to calculate the sample standard deviation. This means that the data point is drawn from a different sample space.

Therefore, since 1) the standard deviation really is just an average of the square of the difference between the data points and the estimated mean; and, 2) to be meaningful, the data points should be drawn independently from the same sample space then it is appropriate to adjust the calculation not counting the last, constrained data point.

Note that when you have access to the entire population this problem goes away, which is why there is the difference between population variance and sample variance.

I know this is a lousy explaniation but it is the best I got.

Sorry

Awesome explanation. Taking a basic stats class at...

2016-01-06T09:54:04.022+01:00

Awesome explanation. Taking a basic stats class at UC Irvine and this just made it click!

@SAS: Ah, I think I understand what you're say...

2014-01-15T22:13:53.746+01:00

@SAS: Ah, I think I understand what you're saying now. Yes, the numbers you suggested have more variability than 2s and -2s, but they're also closer to the mean on average (1.99 vs 2.00). I chose 1s and 3s because they have the same mean deviation as the 2s, and I wanted to isolate the effect of measuring variability.

@SAS: Perhaps I'm misunderstanding, but I don&...

2014-01-15T22:00:41.084+01:00

@SAS: Perhaps I'm misunderstanding, but I don't get the result you're getting with your example. For {-2.1, -2.1, -1.88, -1.88, 1.88, 1.88, 2.1, 2.1}, I get a mean absolute deviation of 1.990 and an RMS deviation of 1.993.

Anyway, I'm looking into the concerns people have raised about using the squares, and will add an explanation/correction to the article once I've understood this.

Thanks

Kickass man! Good job

2013-07-06T19:15:39.713+02:00

Kickass man! Good job

A concise and easy to use explanation. Many thanks...

2013-07-06T15:47:28.072+02:00

A concise and easy to use explanation. Many thanks from a frustrated student, who is sitting in his flat despite the beautiful weather, trying to grapple statistics...

also, we can get 2 different graphs that have the ...

2012-10-16T21:09:07.920+02:00

also, we can get 2 different graphs that have the same standard deviation but different mean absolute deviation

something i don't understand, if we want to am...

2012-10-16T20:57:17.395+02:00

something i don't understand, if we want to amplify error then why don't we sum deviations raised to the power 4 then take the fourth root? or even absolute of power 3 then third root

Thanks for the explanation. It is great post.

2012-10-09T09:39:58.476+02:00

Thanks for the explanation. It is great post.

Assuming a relatively "more" jagged dist...

2012-07-26T16:02:29.329+02:00

Assuming a relatively "more" jagged distribution, doesn't the idea fall apart? In the second diagram, you have chosen all points falling on 1 or 3. Imagine that you replace the value 3 by 2.1 and 1 by 1.88. So 4 points on 2.1 and 4 points on 1.88 as against 8 points with dev 2.
As per your theory/reasoning, we should expect the less jagged eight-2's curve to have lesser std dev than the other jagged one with values at 2.1 and 1.88. However it is just the reverse (the std dev calculated to 1.99 for the jagged curve). Note that mean again is 0. That I believe is the fallacy in your argument. You have chosen an example that supports the theory and used it as 'proof' , however that doesn't hold. Please point out if I am wrong.
P.S. I stumbled upon this blog in search of the same explanation (why std dev than mean dev?) but I cannot accept your explanation.

gr8 post man.. its really intuitive. please post i...

2012-07-24T23:20:56.392+02:00

gr8 post man.. its really intuitive. please post ideas about other theories n concepts as well.. you are doing a great job.

Naveen

The article is indeed valuable, still the whole po...

2012-05-22T11:13:58.456+02:00

The article is indeed valuable, still the whole point of SD remains unclear to me.
1. What type of real-world observation demands that "jaggedness" of the sine wave(Amar's reply 16 September, 2007 22:48) to be discriminated.
2. Why would we still measure this "jagged" behaviour by the same variable (dubbing Anonymous's post 28 December, 2010 12:52).
Isn't it better to use something like
(sum of |Xi+1 - Xi|) divided by (n-1)?

Approx 4.5 years after you originally posted this ...

2012-04-16T00:26:36.089+02:00

Approx 4.5 years after you originally posted this and it is still providing value. Thank you very much.

Truly amazing! You're explained a concept that...

2012-02-04T18:14:31.524+01:00

Truly amazing! You're explained a concept that baffles so many, so concisely and clearly! I cannot thank you enough.

Thank you so much for writing this! Is there any ...

2011-12-21T13:55:10.774+01:00

Thank you so much for writing this!

Is there any chance you'll post more such explanations of mathematical concepts?

For a good discussion of the mean deviation and wh...

2011-12-05T21:10:03.006+01:00

For a good discussion of the mean deviation and why it is superior to standard deviation in dealing with real world data, check out Stephen Gorard's paper here:
http://www.leeds.ac.uk/educol/documents/00003759.htm

Key points are that
- The standard deviation is only reliable when data is normally distributed, if it is not (and it usually isn't) mean deviation is superior.
- Standard deviation amplifies errors, which Amar implied was a good thing for some reason, but in reality this means that outlying data has a disproportionate effect on the result. Mean deviation is much less affected by the odd wacky data point.
- Mean deviation is much easier to understand & could help far more people to actually understand and use statistics.

Good one. thanks

2011-08-13T10:17:48.744+02:00

Good one. thanks