P-Value in Excel
Excel Conditional Formatting for Statistical Significance posted on June 3, This summer I am taking a course on measurement theory … which is as difficult as it sounds. But one of the complications of the course has little to do with sophisticated statistics or management theories. It is a problem faced by analysts everywhere, at any level of analysis: the formatting and presentation of the numbers takes almost as much effort as actually analyzing them.
In this seminar, we want to mimic the style of statistical reporting of a top management publication, The Academy of Management Journal AMJ. This insistence has become a running joke among my classmates. I even made a meme about it: But data formatting and presentation is essential. Copy-and-pastes from statistical software outputs are not allowed.
We must create a narrative with the data, make it easy to digest. And Excel is a great tool for data presentation. Here I show how Excel can be used to automate a common formatting practice in statistical reporting. This is not a post about statistics. Some prior knowledge of the subject is preferred, but if you have no prior background, you can follow along with the Excel steps all the same. It is common to report statistically significant p-values with an asterisk.
You can use conditional formatting in Excel to automate this process. Want to really automate the process? Yours for less than a six-pack. Or continue with the lesson below: Download the exercise file here.
Below is a mock regression predicting office building sales prices. Easy enough to read this table. But imagine you are skimming dozens of these, as a journal referee or grant reviewer might, for example. You want to fly through these findings and know what is statistically significant off the bat.
Conditional Formatting, Always Significant! With conditional formatting, we can format with an asterisk all values below a certain number. I also change the format from General to. This forces the cell to display the value to three decimal points, with no zeroes. P values are always a decimal, so it is a waste of space to include a zero! Now I do the same for the values less than cell H3. You may have noticed the added asterisk throws off the alignment of values in column E.
Be sure to enclose your custom-formatted characters in quotation marks. To correct for this, I add two space marks instead of an asterisk. This way, the values with and without the asterisk appear aligned. This is not just for hard-coded results — conditional formatting works the same on formula-derived values, too. For example, here we want to see if the difference in chi-square values between two models is significant.
I set up conditional formatting the same way as above, except the target cell is a formula. It works entirely the same. One last pitch… If you plan to format p-values frequently, consider purchasing my Statistical Significance Formatter Add-In. This will allow for a dynamic, repeatable approach to formatting p-values so you can easily change your cut-off value and not have to repeatedly set up the special cell formats.
The above tedious process becomes something like this: Download the add-in here. Maybe I can do these stats easier in R. But to present a narrative with my data, Excel is still an excellent tool choice.
Expressing Your Results
The theory behind p-values and the null hypothesis might seem complicated at first, but understanding the concepts will help you navigate the world of statistics. Unfortunately, these terms are often misused in popular science, so it would be useful for everyone to understand the basics. Null Hypothesis and p-Value The null hypothesis is a statement, also referred to as a default position, which claims that the relationship between the observed phenomena is non-existent.
It can also be applied to associations between two observed groups. During the research, you test this hypothesis and try to disprove it. For example, say you want to observe whether a particular fad diet has significant results. The alternative hypothesis is that the diet did make a difference. This is what researchers would try to prove. The p-value represents the chance that the statistical summary would be equal to or greater than the observed value when the null hypothesis is true for a certain statistical model.
Though it is often expressed as a decimal number, it is generally better to express it as a percentage. For example, the p-value of 0. A low p-value means that the evidence against the null hypothesis is strong.
This further means that your data is significant. To prove that the fad diet works, researchers would need to find a low p-value. A statistically significant result is the one that is highly unlikely to happen if the null hypothesis is true. The significance level is denoted with the Greek letter alpha and it has to be bigger than the p-value for the result to be statistically significant.
Some of the prominent fields include sociology, criminal justice, psychology, finance, and economics. Though the steps should generally apply to all versions, the layout of the menus and whatnot will differ.
Create and populate the table. Our table looks like this: Click on any cell outside your table. After the open bracket, type in the first argument. In this example, it is the Before Diet column.
The range should be B2:B6. Thus far, the function looks like this: T. Test B2:B6. The After Diet column and its results are our second argument and the range we need is C2:C6. Test B2:B6,C2:C6. Type in a comma after the second argument and the one-tailed distribution and two-tailed distribution options will automatically appear in a drop-down menu. Double-click on it. Type in another comma. Double-click on the Paired option in the next drop-down menu.
Now that you have all the elements you need, close the bracket. The cell will display the p-value immediately. In our case, the value is 0. Data Analysis Route The Data Analysis tool lets you do many cool things, including p-value calculations.
Next, click on the Data tab in the Main menu. Select the Data Analysis tool. Click OK. A pop-up window will appear. In our example, it is B2:B6. In this case, it is C2:C6. Click on the Output Range radio button and pick where you want the result. Excel will calculate the p-value and several other parameters. The final table might look like this: As you can see, the one-tail p-value is the same as in the first case — 0.
Since it is above 0. Then, enter another comma and select Paired. Test B2:B6, C2:C6,1,1. Finally, press Enter to show the result. The results may vary by a few decimal places depending upon your settings and available screen space.
If the p-value is equal to 0. If it is less than 0. In case the p-value is more than 0. You can change the alpha value, though the most common options are 0. Choosing two-tailed testing can be the better choice, depending on your hypothesis.
In the example above, one-tailed testing means we explore whether the test subjects lost weight after dieting, and that is exactly what we needed to find out. But a two-tailed test would also examine whether they gained statistically significant amounts of weight.
The p-Value Demystified Every statistician worth his or her salt has to know the ins and outs of null hypothesis testing and what the p-value means. This knowledge will also come in handy to researchers in many other fields. Have you ever used Excel to calculate the p-value of a statistical model?
Which method did you use? Do you prefer another way to calculate it? Let us know in the comments section.
6 Ways to Visualize Statistical Significance
Numbers as Bar Charts In these two tables, I created dot plots with numbers instead of dots or another marker FiveThirtyEight has a nice example of this technique. I also used boldface text for the statistically significant results to help further highlight them. In this version, I take the same approach and also fill the statistically significant cells. Because the cells are pretty wide driven by the 0. Dot Plots Moving away from the tables, I played around with different dot plots.
Of course, I could have used a standard column chart or some kind of box-and-whisker chart, but I think the dot plot might be the best choice plus, the column chart approach has some issues. I tried four different dot plots, just playing around with different kinds of labeling: —Version 1 plots just the estimates with the error bars that I made up based on the level of statistical significance.
How to add error bars in PowerPoint and Excel
This required me to shrink the text size just a bit and blow up the size of the bubble. I also change the format from General to.
This forces the cell to display the value to three decimal points, with no zeroes. P values are always a decimal, so it is a waste of space to include a zero! Now I do the same for the values less than cell H3. You may have noticed the added asterisk throws off the alignment of values in column E.
Be sure to enclose your custom-formatted characters in quotation marks. To correct for this, I add two space marks instead of an asterisk. This way, the values with and without the asterisk appear aligned. This is not just for hard-coded results — conditional formatting works the same on formula-derived values, too.
How To Calculate p-Value in Excel
For example, here we want to see if the difference in chi-square values between two models is significant. I set up conditional formatting the same way as above, except the target cell is a formula.
It works entirely the same. One last pitch… If you plan to format p-values frequently, consider purchasing my Statistical Significance Formatter Add-In. You do this by showing error bars on your graph, commonly standard deviation SDstandard error of the mean SEMor confidence intervals CI. Statistics packages will do this for you, but what if you want to create the graph using good old Microsoft programs?
Step 2: Tell PowerPoint you want error bars on your chart or graph Error bars count as a chart element. You can add them in a couple of different ways. You can use the ribbon: Or you can use the button on the graph itself: When you click the button, choose More options and a formatting pane should open on the right-hand side of PowerPoint.
Make sure the chart icon at the top of the pane is the one that is selected. Working from the top down, you can first choose which direction the errors bars should go in. You can also choose whether you want them capped or not. Finally, you can choose the error amount.