Description of the Scientific Process: Guide for Using Excel for Statistics and Charts


Guide for Using Excel for Statistics and Charts

This is a quick guide for using Excel (2007 and later) to solve statistics problems and create charts for you science fair project. This guide is intended to be a starting point for you, since every project is different. This guide also assumes that you have access to the document “Description of the Scientific Process.”

Basic Terms and Concepts

 image001.png

Figure 1: Cells in Excel. The active cell here is A1.

To start with, you will need to understand some basic terms and concepts about Excel. If you are totally unfamiliar with the software, then this guide might be too advanced for you, but there are many great guides available online for this ubiquitous program. Excel is a spreadsheet[1] program. As such, each open sheet has a grid of cells with alphabetic column headers and numeric row headers. Cells can be identified by their coordinates. The cell A1 is the top left corner cell; A2 is its neighbor to the right; and B1 is its neighbor below (Figure 1). You can fill cells with numbers, text, and formulas.

 

Figure 2: Sheet selector.

By default, a new Excel spreadsheet will have three sheets. You can remove, add, or rename these sheets, and they can reference each other (Figure 2).

When you enter your data, make sure you have a plan. In general, have a column for individual identification labels and columns for measurements, dates, and other notations that you make. If things look messy, you may want to move some things over to another sheet. As you work, familiarize yourself with the copy/cut/paste functionality, as well as insertion and deletion of columns and rows.

Excel functions are special formulas that you can input in order to make calculations or a few other specialized activities. Anytime you begin filling a cell with the equal sign (“=”), you are telling Excel that you want it to do something. Fill the cells A1 and A2 with numbers. Now type “=A1+A2” into any other cell. You will see the sum of those two numbers. “A1-A2” yields the difference; “A1*A2” reveals the product; and “A1/A2” calculates the division of the value in A1 by the value in A2. You may use parentheses to make more complicated functions.

 image003.png

Figure 3: The Function Library.

Excel provides a whole host of formulas that you can use to perform all sorts of calculations and other manipulations. In general, each formula is written as =FORMULA(argument1,argument2,argument3,…). Each of those arguments is specific to the formula you are using, and Excel will help you find out what you need in those fields. You can find the full list of formulas in the Function Library on the Formula Ribbon (Figure 3).

Frequently, you will use the notation “Column1Row1:Column2Row2.” You can use this to select multiple cells for a given for a given function, something that is necessary for many of them. As an example, “A1:B2” selects the four cells A1, A2, B1, and B2 in what is called a table. Sometimes, we will encourage you to insert dollar signs (“$”) before column letters or row numbers. When you copy and paste cells that contain these symbols, those particular columns and rows will remain unchanged in the cells in which you pasted. Play around with some of these things. If you’ve got a blank spreadsheet, you really won’t hurt anything.

Simple Statistical Functions

Let’s start with the simplest functions regarding averages and variability.

  • You may add many cells using the function =SUM(number1,number2,). Those numbers, may of course, be cells or fields. For example, =SUM(A1:A10) adds all of the numbers in column A, rows 1 through 10.
  • You may count values using the function =COUNT(number1,number2,). It will count all numbers listed.You may compute the mean using the function =AVERAGE(number1,number2,).You may compute the median using the function =MEDIAN(number1,number2,).
    • There are useful variations on this function called =COUNTIF and =COUNTIFS which you can use to count only specific kinds of numbers. Look them up, if you think you need them.
  • You can find the mode using the function =MODE.SNGL(number1,number2,).
  • To find the variance of the population[2], use the function =VAR.P(number1,number2,).
  • To find the sample variance, then use the function =VAR.S(number1,number2,).
  • To find the standard deviation of the population, use the function =STDEV.P(number1,number2,).
  • To find the sample standard deviation, use the function =STDEV.S(number1,number2,).
  • In order to calculate the sample proportion, use =COUNTIF(range,criteria)/COUNT(number1,number2,).Calculate square roots with SQRT(number).
    • Range must be a table, i.e. A1:A20.
    • Criteria is the condition in which Excel will count your data. If you scored hits as 1’s and misses as 0’s, then your formula should be =COUNTIF(A1:A20,”1”)/COUNT(A1:A20).
  • You can create a 95% confidence interval in you spreadsheet. You calculate the standard error () with the function =
    • You may then subtract and add twice the resulting standard error from the mean to get the confidence interval.
    • If you want to use a 98% confidence interval, change the first term from 0.05 to 0.02.
  • You can calculate the margin of error at 95% confidence with =1.96*SQRT(((COUNTIF(range,criteria)/COUNT(number1,number2,))*(1-COUNTIF(range,criteria)/COUNT(number1,number2,)))/COUNT(number1,number2)
    • You may then subtract and add the resulting margin of error from the sample proportion to generate a 95% confidence interval.
    • If you want to use a 98% confidence interval, substitute 2.33 for 1.96.
  • To perform a hypothesis test on a set of data against a known constant, use =T.DIST.2T(ABS(AVERAGE(number1,number2,)-hypothesis)/(STDEV.S(number1,number2,)/SQRT(COUNT(number1,number2))),COUNT(number1,number2)-1)
    • This is the test for the hypothesis that .
    • The function =ABS(number) generates the absolute value of that number.
    • If you want to check against the hypothesis , substitute T.DIST for T.DIST.2T and add ,FALSE immediately before the final ).
    • If you want to check against the hypothesis , substitute T.DIST.RT for T.DIST.2T.
    • Each of these tests returns the p-value.
  • To perform a Student’s t-test with two continuous datasets, use the function =T.TEST(array1,array2,tails,type).
    • Array1 and array2 are the ranges of the two sets of data you are comparing.
    • Tails = 1 if the test is one-tailed. Tails = 2, if the test is two-tailed.
    • Type refers to the parameters of your test. Type = 1, if the data are paired. Type = 2, if the samples have equal variance. Type = 3, if the samples have unequal variance.
    • The output will be a p-value.
  • To perform a z-test on binomial data against a hypothesis, use the function =NORM.DIST((AVERAGE(array1)-hypothesis)/SQRT((hypothesis*(1-hypothesis))/COUNT(range1)),0,1,TRUE)
    • This tests the hypothesis that .
    • It returns a p-value.
  • To perform a z-test on binomial data against a hypothesis, first use the function =AVERAGE(range1)*COUNT(range1)/(COUNT(range1)+COUNT(range2))+AVERAGE(range2)*COUNT(range2)/(COUNT(range1)+COUNT(range2)). This is the (p-hat the proportion of successes for the combined samples.) In another cell, insert the function =NORM.S.DIST((AVERAGE(range1)-AVERAGE(range2))/SQRT(p-hat*(1-p-hat)*(1/COUNT(range1)+1/COUNT(range2))),FALSE).
    • This test returns a p-value.
  • To generate a Pearson coefficient (), use the function =PEARSON(array1,array2).To generate a p-value from a Pearson coefficient, use the function =T.DIST.2T(PEARSON(array1,array2)*SQRT((COUNT(array1)-2))/(1-PEARSON(array1,array2)^2)),COUNT(array1)-2).
    • Both arrays have to have the same number of cells, since all of the values should be paired for individuals.
    • You can add ^2 to the end to get the value .

There are, of course, other tests that you may want to perform. Some of those are explicitly listed in the Excel Function or are Data Analysis options. You can also look at the abundant examples of worked problems available on the internet.


Making Charts

 

Figure 4: Chart selector in Excel.

In general, chart making in Excel is easy, but it can be hard to get things just right. Once you have some data in your spreadsheet, go to the Insert ribbon and go to the Charts area (Figure 4). There, you will find a variety of charts to choose from. You can select your data area before you click the chart type that you want, and Excel will try to interpret what you want to do. Sometimes it’s right, and sometimes it’s wrong, so you will need to check what it is doing. If you choose not to select your data first, then Excel will produce a blank chart. You can select that and begin working.

 image017.png

Figure 5: Chart Tools tabs.

Once selected, three new Chart Tools tabs will appear in the ribbon: Design, Layout, and Format[3] (Figure 5). Each one contains several tools that you can use to specify what your chart is about and to change how the chart looks. We are not going over everything you can do, but you should know the very basics.

 image018.png

Figure 6: Select Data Source window.

The most important button in the Design tab is the Select Data button. That allows you to tell Excel exactly what data you’re using for the chart. Once you have clicked it, you will be able to add and edit data series (Figure 6). Data series are related data points (i.e., the pH of each individual in your control sample.) If you click the Add or Edit button, you can select the data for the series, as well as name the series.

The Layout tab offers up numerous important options for customizing your chart. You can use the buttons in this tab to add axis labels, a chart title, a legend, and various other options particular to the kind of chart you are working with. We are not going into depth here, and you should experiment. If it looks like you should be about to do something, then you probably can, and we recommend searching the internet for help on any particular thing.

image019a.png

Figure 7: The default Excel color scheme.

The Format tab is mostly about the aesthetics of your chart. You can pick colors, fonts, and other visual modifications in this tab. If you are using the chart in a slideshow, paper, or poster, then you need to make it look special. Excel starts with a particular color scheme that is pleasant (Figure 7), but everyone alread knows it, and they will see that you put no effort into making your figures your own. If color will cost you extra money, then figure out how to use grayscale with your data.

Generating Random Patterns with Excel

Often in science, it’s important to randomize your experimental design. You may need to measure things in a random order or lay them out in a grid without bias introduced from a predictable pattern. There are different ways to do this, and Excel is a readily available tool. Start by listing your individuals, treatments, or whatever else it is that you need to randomize down one column. In a column next to that one, type in the function =RAND(). That will generate a random number between 0 and 1. Select the list you created and the random numbers. In the Home tab, select the Sort and Filter pull-down menu and click Custom. Click Sort By and pick the column with random numbers. Click OK, and your list is randomized. You can delete the random numbers as well. If you are keeping other columns of data or other associated information with the first column, make sure to select them before you sort, or they will stay in their original cells instead of maintaining their association.

 


[1] We don’t thank accountants enough for things, and the spreadsheet is biggest contribution to society from the world of accounting. Without it, so many useful tasks would be much harder, and early business demand for computers would have been much less, putting us years behind our current computing technology. So, thank you, accountants. You quietly make the world a better place.

[2] Remember the difference between the variance of the population and the sample variance. If you are looking at your data, and it is only a representative sample of the whole population, then should use the sample variance.

[3] I can’t say that Microsoft made good decisions in naming these tabs. They’re all basically synonyms, but you have to work with what you’re given.