Chapter 3 Excel
In this chapter we will go over the commonly used software tool for recording research data, Excel. You will learn the best practices for structuring your data for analysis, and how to view and navigate your data within Excel.
3.1 Spreadsheet Set-Up
Spreadsheets are graphical interfaces that organize information into rows, columns, and cells. It allows researchers to organize and store their data. There are many different spreadsheet programs: Google Sheets, Numbers, and Excel.
Excel is the most commonly used across researchers, and when you open the software you see s spreadsheet. The values inside of each cell, represents your data.
It is important to understand the significance of how your data is formatted and stored before you begin the collection stage (sometimes this isn’t possible). This is because you want to ensure the information you need to conduct your analysis is recorded in the correct way, and that you are not missing information you may need later. For your spreadsheet set-up there are best practices that we will cover.
3.1.1 Best Practices
When entering your data, it is important to remember these 5 best practices:
1. Variables are Columns
The best thing to do is to remember that variables = columns. These are the things you are measuring like temperature, or weight.
2. Observations are Rows
Since your variables = columns, observations = rows. Each row represents something that is being measured, like a fish.
3. One Piece of Information Per Cell
A good saying to use to remember this is “One variable per column, One observation per row, and One piece of information per cell.
This would be 1 length measurement, or the name of a species, the year, the month, etc.
4. Raw Data Stays Raw
When you enter your data into a spreadsheet, that is considered raw data. That data hasn’t been cleaned, altered, or manipulated. Essentially, unchanged. Never Change It!
When working on a research project, mistakes happen. It’s important to be proactive and avoid mistakes where possible. If something happened while you were working on your dataset, you want to be able to have a back-up just in case.
If you want to clean it, reformat it, or have the chance to go back and double check your data, make a copy of the spreadsheet. This way, you can ALWAYS refer back to the original in case something goes wrong!
This is the same for exporting data after it has been wrangled, always save the final or cleaned version as a separate data file to use for analysis.
5. Save as a .csv
Saving your spreadsheet as a .csv file ensures that everyone can open and use it because it is a recognized format across software. Remember the previous chapter where different file formats and extensions are used or required by different software, it is always safe to use the most accessible options. This way, there are less chances for data to be altered or converted to different formats when exporting across software.
3.1.2 Dealing with NA’s
When you have to enter in data that doesn’t exist, or have a “blank” for, using NA is the best way to avoid issues when you get to importing the data into a software tool.
It is important to remember that NA is not the same thing as Na or na. Many software tools are case sensitive, which applies to more than NA values.
Yes is not the same thing as yes or Y, so you always want to make sure to be consistent in how you are recording your data.
3.1.3 Working with Dates
Dates are a nightmare, simple as that. There are so many different formats, and all of them are used. There is no format that is globally used, 100% of the time, consistently.
Excel treats dates the same way, with zero regard for consistency. This can cause a variety of issues by reformatting columns, which will leave you with the task of checking back you your logbooks to re-enter the correct dates. There is a way to avoid all formatting mistakes for Dates and Times
Remember the best practices, 1 variable per column, 1 observation per row, and 1 piece of information per cell. A date is 3, year, month, day! It could also include time in the date, which is another headache.
So remember those best practices and separate your dates into year, month, day variables!
3.2 Variable naming
An important thing to remember when you are setting up your spreadsheets, think like a programmer. We want to make our lives easier down the road, and a good way to do that is through variable naming.
The first thing to do is stop using dots or periods.
Using a period in your variable name, is a big red flag. Periods aren’t recognized across software tools, and when trying to import data it could lead to parsing issues.
Always play it safe, and avoid using them. We can use different naming conventions for variables, but whatever you choose be consistent. There are many to pick from, but we will focus on 4 of the most common:
3.2.1 Pascal Case
Pascal case is where the first letter of every word is a capital letter. This means that in a variable with multiple words, each word can be seen because of the capital letter.
This naming convention is mostly used to name Classes in the programming languages like C#, JavaScript, Java, and Python.
3.2.2 Camel Case
Camel case is similar to pascal case however the first word is always lower case, and every additional word begins with a capital letter. If a variable is one word, it is all lowercase.
This naming convention is mostly used to name variables, functions, and methods in programming languages like JavaScript & Java.
3.2.3 Snake Case
Snake case is what you will see the most throughout this online guide. It is lowercase, with each word or character separated with an underscore “_“.
Snake case is widely used in naming variables, functions, and methods in the Python programming language.
3.3 Viewing Data in Excel
Although Excel can be used for creating tables, plots, and statistics, it is best to limit the use of Excel to viewing data.
This is because of the drag-and-drop nature of spreadsheet programs, and the unpredictability of cut and paste. It is very easy to accidentally apply a different formula to adjacent cells, and to retrace your steps.
One of the biggest issues that is more common with using Excel for research purposes, is the inability to handle large datasets. In some cases a dataset can span over decades and generate over a million rows, which happens to be the limit when viewing data.
This can cause problems as sometimes it is hard to visualize or picture data that you are unable to see!
3.3.1 Filter
When viewing your data in excel and exploring or filtering data, we will always use the filter option. Never use the sort functions or options. There are too many things that can go wrong when using it. This is because sort works on the specified column(s), not the entire dataset.
To avoid the potential mistakes that could happen, user the Filter button under the Home tab in the toolbar! This will ensure that when filtering by a column or multiple columns, the entire dataset is filtered and not just the columns included within the filter.
INCLUDE A VISUAL HERE
3.3.2 Fonts & Formatting
Colours and fancy fonts may look nice, but they are not needed. When importing data into software tools used for data analysis, fonts and colours are either ignored, or not recognized and will cause an error.
The best way to avoid this issue, is to keep it simple.
This is also the same case for special characters like: #$%^&*()
Try to avoid using special characters in your spreadsheets because these characters may not be recognized, or they could be confused with a different operator or function.
3.4 Chapter Wrap-Up
Learning how to use Excel can be challenging. There are many different things that you can do with Excel, but without the expertise it is a recipe for disaster. Minimizing the potential for errors is why we limit the use of Excel for data entry and viewing.
You should now have a general understanding of why spreadsheet set-up is important, and be well-versed in formatting best practices, dealing with NA values and dates, and how to name your variables.
The data spreadsheet itself is important but it represents just 1 half of research data in its entirety, in the next chapter you will learn about the other half.
3.4.1 Chapter Terms & Definitions
Here is a summary of some of the bolded terms used throughout this chapter, refer back to this list whenever you need a refresher!
PUT TERM LIST HERE
3.4.2 Additional Links & Resources
If you are still confused, or you would like more information on topics discussed in this chapter, check out the following links: