UNDERSTANDING A BOX AND WHISKER PLOT
Occasionally you might see a 'box and whisker plot' in my posts, sounds like the name of your dog and cat doesn't it? Well no they are far more interesting, although perhaps not so cute?
I'm going to repeat / share a number of explanations with you from various sources.
The first can be found here: http://flowingdata.com/2008/02/15/how-to-read-and-use-a-box-and-whisker-plot/
The authors briefly explain that:
"The box-and-whisker plot is an exploratory graphic, created by John W. Tukey, used to show the distribution of a dataset (at a glance). Think of the type of data you might use a histogram with, and the box-and-whisker (or box plot, for short) could probably be useful."
The example used here is burger consumption.
Box Plot Explained (From Flowingdata.com.)
"Let's say we ask 2,852 people (and they miraculously all respond) how many hamburgers they've consumed in the past week. We'll sort those responses from least to greatest and then graph them with our box-and-whisker.
Take the top 50% of the group (1,426) who ate more hamburgers; they are represented by everything above the median (the white line). Those in the top 25% of hamburger eating (713) are shown by the top "whisker" and dots. Dots represent those who ate a lot more than normal or a lot less than normal (outliers). If more than one outlier ate the same number of hamburgers, dots are placed side by side."
These authors also talk about how you can view skewed data off them, "The box-and-whisker of course shows you more than just four split groups. You can also see which way the data sways. For example, if there are more people who eat a lot of burgers than eat a few, the median is going to be higher or the top whisker could be longer than the bottom one. Basically, it gives you a good overview of the data's distribution."
According to Bryman and Cramer, (1997), in their book, 'Quantitative Data Analysis - with SPSS for Windows, the box comprises of the middle 50% of observations, therefore the lower end of the box, in terms of the measure to which it refers, is the first quartile (the base of the box) and the top of the box is therefore the third quartile, (top of the box). The box itself therefore comprises the interquartile range. The line in the box is the median (middle not average remember) while the broken lines above and below the box (the whiskers) reach down to the lowest and up to the highest values respectively. The outliers, as explained above are usually represented as dots above (or below) the broken line The Box Plot provides information about the shape and dispersion of a distribution Bryman and Cramer, (1997). It's important to watch where most of the data points lie in and out of the box and where the median sits in the box.
Hopefully I can add further examples later or get back to this topic on another occasion. Both authors state that it is a shame most people don't understand quartiles as this is otherwise a good way of interpreting distribution. Another statistical tool, the stem and leaf display is reportedly slightly easier to understand but with the power of 'R' there are likely a multitude of new ways to display this data which I hope to discover as I make my way through the lessons?