tag:blogger.com,1999:blog-9250821.post9070517618364765528..comments2019-06-10T08:09:04.244+02:00Comments on Amar Sagoo: Making sense of standard deviationAmarhttp://www.blogger.com/profile/06287136617423704188noreply@blogger.comBlogger60125tag:blogger.com,1999:blog-9250821.post-83775077920795423592019-04-27T06:20:39.664+02:002019-04-27T06:20:39.664+02:00Thanks Amar!Thanks Amar!Unknownhttps://www.blogger.com/profile/05512755703813529246noreply@blogger.comtag:blogger.com,1999:blog-9250821.post-39114742884853456802018-08-15T16:44:49.435+02:002018-08-15T16:44:49.435+02:00For days, I have been trying to figure out exactly...For days, I have been trying to figure out exactly why the difference was to be squared, and your example nails it! As to squaring or cubing or higher levels, I believe that it would amplify the results even more. Squaring should be sufficient for intuition for statisticians, I think. I will however try to extend your example using cubes and see how it works out. But thank you so much for this really beautifully explained article!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-20818336494919435402016-01-31T19:26:23.320+01:002016-01-31T19:26:23.320+01:00Two comments:
Regarding the use of higher power...Two comments: <br /><br />Regarding the use of higher powers to "amplify the tails," so to speak: this is known but is not commonly used. By using the third power, you will get a measure of how skewed the data is. Roughly speaking this would be a measure of the spread between mean and median. <br /><br />Using the fourth power is a measure called kurtosis. This measure is roughly intended to give some idea of how heavy the tails are (or how many data points are a distance away from the mean).<br /><br />Regarding the intuition of the bias of the estimated standard deviation the simple answer is that the average used to calculate the standard deviation has an error in it. When you account for the effect of this error on the estimated standard deviation, you get the N-1 term. <br /><br />More technically, what is happening is this: if the assumption is that all the data points are drawn (with replacement to make is simple) from a distribution with a mean and a variance then the assumption is that the drawn value of each data point can be of any of the potential values of the sample space of distribution. This is absolutely true for for the first N-1 draws for a sample of size N. However, this is not true for the Nth draw because the Nth draw is constrained to a sample space of the single value that sets the final average that was used as the estimated mean to calculate the sample standard deviation. This means that the data point is drawn from a different sample space. <br /><br />Therefore, since 1) the standard deviation really is just an average of the square of the difference between the data points and the estimated mean; and, 2) to be meaningful, the data points should be drawn independently from the same sample space then it is appropriate to adjust the calculation not counting the last, constrained data point.<br /><br />Note that when you have access to the entire population this problem goes away, which is why there is the difference between population variance and sample variance. <br /><br />I know this is a lousy explaniation but it is the best I got.<br /><br />SorryAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-7187283153865569022016-01-06T09:54:04.022+01:002016-01-06T09:54:04.022+01:00Awesome explanation. Taking a basic stats class at...Awesome explanation. Taking a basic stats class at UC Irvine and this just made it click!Josenoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-91326016972340174982014-01-15T22:13:53.746+01:002014-01-15T22:13:53.746+01:00@SAS: Ah, I think I understand what you're say...@SAS: Ah, I think I understand what you're saying now. Yes, the numbers you suggested have more variability than 2s and -2s, but they're also closer to the mean on average (1.99 vs 2.00). I chose 1s and 3s because they have the same mean deviation as the 2s, and I wanted to isolate the effect of measuring variability.Amarhttps://www.blogger.com/profile/06287136617423704188noreply@blogger.comtag:blogger.com,1999:blog-9250821.post-63733407731240654722014-01-15T22:00:41.084+01:002014-01-15T22:00:41.084+01:00@SAS: Perhaps I'm misunderstanding, but I don&...@SAS: Perhaps I'm misunderstanding, but I don't get the result you're getting with your example. For {-2.1, -2.1, -1.88, -1.88, 1.88, 1.88, 2.1, 2.1}, I get a mean absolute deviation of 1.990 and an RMS deviation of 1.993.<br /><br />Anyway, I'm looking into the concerns people have raised about using the squares, and will add an explanation/correction to the article once I've understood this.<br /><br />ThanksAmarhttps://www.blogger.com/profile/06287136617423704188noreply@blogger.comtag:blogger.com,1999:blog-9250821.post-88891756319522425892013-07-06T19:15:39.713+02:002013-07-06T19:15:39.713+02:00Kickass man! Good jobKickass man! Good jobAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-39411767751262914362013-07-06T15:47:28.072+02:002013-07-06T15:47:28.072+02:00A concise and easy to use explanation. Many thanks...A concise and easy to use explanation. Many thanks from a frustrated student, who is sitting in his flat despite the beautiful weather, trying to grapple statistics...simonhttp://www.berlin.denoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-42397410137741869202012-10-16T21:09:07.920+02:002012-10-16T21:09:07.920+02:00also, we can get 2 different graphs that have the ...also, we can get 2 different graphs that have the same standard deviation but different mean absolute deviationAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-685318083391947772012-10-16T20:57:17.395+02:002012-10-16T20:57:17.395+02:00something i don't understand, if we want to am...something i don't understand, if we want to amplify error then why don't we sum deviations raised to the power 4 then take the fourth root? or even absolute of power 3 then third rootAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-29352638923644959122012-10-09T09:39:58.476+02:002012-10-09T09:39:58.476+02:00Thanks for the explanation. It is great post.Thanks for the explanation. It is great post.fdrnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-16951942165193585432012-07-26T16:02:29.329+02:002012-07-26T16:02:29.329+02:00Assuming a relatively "more" jagged dist...Assuming a relatively "more" jagged distribution, doesn't the idea fall apart? In the second diagram, you have chosen all points falling on 1 or 3. Imagine that you replace the value 3 by 2.1 and 1 by 1.88. So 4 points on 2.1 and 4 points on 1.88 as against 8 points with dev 2.<br />As per your theory/reasoning, we should expect the less jagged eight-2's curve to have lesser std dev than the other jagged one with values at 2.1 and 1.88. However it is just the reverse (the std dev calculated to 1.99 for the jagged curve). Note that mean again is 0. That I believe is the fallacy in your argument. You have chosen an example that supports the theory and used it as 'proof' , however that doesn't hold. Please point out if I am wrong.<br />P.S. I stumbled upon this blog in search of the same explanation (why std dev than mean dev?) but I cannot accept your explanation.SASnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-24849506373200340652012-07-24T23:20:56.392+02:002012-07-24T23:20:56.392+02:00gr8 post man.. its really intuitive. please post i...gr8 post man.. its really intuitive. please post ideas about other theories n concepts as well.. you are doing a great job.<br /><br />NaveenAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-69136286758113559752012-05-22T11:13:58.456+02:002012-05-22T11:13:58.456+02:00The article is indeed valuable, still the whole po...The article is indeed valuable, still the whole point of SD remains unclear to me.<br />1. What type of real-world observation demands that "jaggedness" of the sine wave(Amar's reply 16 September, 2007 22:48) to be discriminated.<br />2. Why would we still measure this "jagged" behaviour by the same variable (dubbing Anonymous's post 28 December, 2010 12:52).<br />Isn't it better to use something like <br />(sum of |Xi+1 - Xi|) divided by (n-1)?Nharahttps://www.blogger.com/profile/13316496728921574671noreply@blogger.comtag:blogger.com,1999:blog-9250821.post-10667122102313154232012-04-16T00:26:36.089+02:002012-04-16T00:26:36.089+02:00Approx 4.5 years after you originally posted this ...Approx 4.5 years after you originally posted this and it is still providing value. Thank you very much.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-39354531289877202112012-02-04T18:14:31.524+01:002012-02-04T18:14:31.524+01:00Truly amazing! You're explained a concept that...Truly amazing! You're explained a concept that baffles so many, so concisely and clearly! I cannot thank you enough.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-3350560013049752652011-12-21T13:55:10.774+01:002011-12-21T13:55:10.774+01:00Thank you so much for writing this!
Is there any ...Thank you so much for writing this!<br /><br />Is there any chance you'll post more such explanations of mathematical concepts?Louishttps://www.blogger.com/profile/01353159346340853191noreply@blogger.comtag:blogger.com,1999:blog-9250821.post-17995208476660259972011-12-05T21:10:03.006+01:002011-12-05T21:10:03.006+01:00For a good discussion of the mean deviation and wh...For a good discussion of the mean deviation and why it is superior to standard deviation in dealing with real world data, check out Stephen Gorard's paper here:<br /><a href="http://www.leeds.ac.uk/educol/documents/00003759.htm" rel="nofollow">http://www.leeds.ac.uk/educol/documents/00003759.htm</a><br /><br />Key points are that <br />- The standard deviation is only reliable when data is normally distributed, if it is not (and it usually isn't) mean deviation is superior. <br />- Standard deviation amplifies errors, which Amar implied was a good thing for some reason, but in reality this means that outlying data has a disproportionate effect on the result. Mean deviation is much less affected by the odd wacky data point.<br />- Mean deviation is much easier to understand & could help far more people to actually understand and use statistics.Mr Dennishttps://www.blogger.com/profile/08082162226406043486noreply@blogger.comtag:blogger.com,1999:blog-9250821.post-70196669464636763072011-08-13T10:17:48.744+02:002011-08-13T10:17:48.744+02:00Good one. thanksGood one. thanksNithinnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-77260823511002997352011-04-05T15:26:04.592+02:002011-04-05T15:26:04.592+02:00According the central limit theorem, for large eno...According the central limit theorem, for large enough N (in practice almost for any N) the deviations will be distributed around the mean value according to the normal law. The latter can be fully characterized by the mean value and the standard deviation calculated with the formula containing the squares. That's why the definition using the squares is so special.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-89596100072731422042011-03-25T08:29:17.531+01:002011-03-25T08:29:17.531+01:00Thanks. Very well explained!Thanks. Very well explained!Manjeet Dahiyahttp://manjeetdahiya.comnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-51984190708770089542011-01-29T02:53:03.725+01:002011-01-29T02:53:03.725+01:00@anonymous 28 July 2010. Thanks for the link! To s...@anonymous 28 July 2010. Thanks for the link! To summarize for those that didn't click it or want to read that much: standard deviation and mean deviation are both stable indicators of variability in a sample set. Mean variation is actually better than standard deviation in real life data since it is less likely to magnify error values. However, the main advantage of mean variation is that it has a clear, intuitive meaning. As others have pointed out, you could use cubes and cube roots and it would work too, but what would the number mean?<br /><br />Thanks to Amar for getting me started, thanks to anonymous for finishing the job.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-52830747677813924852011-01-27T09:50:40.076+01:002011-01-27T09:50:40.076+01:00This is really a great explanation. Thanks for it....This is really a great explanation. Thanks for it.Apple Grewhttps://www.blogger.com/profile/17329934024949552966noreply@blogger.comtag:blogger.com,1999:blog-9250821.post-90569227321511187102011-01-15T02:46:43.552+01:002011-01-15T02:46:43.552+01:00Thanks for the refresher course, it's been yea...Thanks for the refresher course, it's been years since that was important to me. I haven't used it since I graduated.<br />The closest I've come to statistics since then was charting the difference from this week's results to a 13 week median value.<br />Originally my manager complained about the mainframe going down and not being notified about it. This worked great. An unintended benefit was that it became a trending analysis tool. (More or fewer people were on that mainframe this week from earlier results.)Kennoreply@blogger.comtag:blogger.com,1999:blog-9250821.post-56626580584610613672011-01-14T19:01:42.220+01:002011-01-14T19:01:42.220+01:00I agree, "convenience" seems to be the e...I agree, "convenience" seems to be the explanation. Since we are talking about pre-computer's era. Square root is easy to perform "by hand". The higher the exponents the harder to do.<br />In the other hand, if you use higer exponents the deviation will be numerically equal to the higest absolute diference found in data. This is true when you use <b>∞</b> as your exponent.Anonymousnoreply@blogger.com