By: Sofia Salazar and Hannah Providence
Data visualizations are important to any research paper as they can effectively illustrate trends found in data that are not obvious when observing raw data. Visual learning is one of the most popular learning styles and can capture your audience’s attention to the results being presented.
In this post, we’ll cover:
- Incorporating data visualizations
- Determining how to best display data
- Cleaning data
- Recommending data visualization tools
Incorporating Data Visualizations
Different theses may use different methods for data visualization including:
- Heat maps
- Pie charts
- Bar graphs
- Heat Maps
- Scatter Plots
- Line graphs
- Timelines
- Cartograms
- Tree Diagrams
Each method mentioned serves a specific purpose in presenting data and has its strength when displaying qualitative or quantitative data, as well as nominal, ordinal, or interval variables.
Determining How To Best Display Data
Knowing what method of data visualization to use is crucial to not present misleading information.
Deciding whether variables are nominal (categories that cannot be ranked), ordinal (can be ranked but not quantified), or interval (can be ranked and quantified) is one of the first steps in visualization, and determines whether data is qualitative or quantitative. Nominal and ordinal variables are qualitative data unless coded, while interval data is quantitative.
Quantitative data is best represented by bar graphs, histograms, pie charts, scatter plots, and often cartograms to represent numerical data. Meanwhile, qualitative data is best represented by timelines, flowcharts, tables, and tree diagrams, as they are better able to represent categorical data.
Finally, data visualization may require the use of color to highlight categories or different aspects of a whole. Using color is best when color means something as to the data. If there is no difference to point out through the use of color it is best not to use it. Be sure to use hex color codes to keep the colors you use to represent the data evenly. Determining whether to use sequential (shades of the same color) or diverging color themes (opposing colors) can also boost your data representation.
Here are some tips to help with key factors of data visualization:
- When using figures, be sure to label them below the figure
- When using tables, be sure to label them above the table
- Make sure axes start at zero to avoid misleading readers with your data
- Using hex codes in data visualization can provide further ease in understanding and observing trends and differences across variables, but only when color matters!
Cleaning Your Data
Data cleaning is how analysts prep their data before analysis. It is the process of finding and correcting inaccurate records and removing, adding, or recoding unfinished data into your dataset/dataset.
Why is cleaning data important?
Data cleaning improves efficiency of data analysis by ensuring that you get rid of errors in formatting and false values before you even begin your analysis. This way, your formulas run correctly the first time and you can feel confident that you’re working with a strong dataset. (Sharma, 2020)
Cleaning data is essential to the analytical process. “Data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time actually analyzing it” (Lohr, 2014). As data becomes more readily available, the error margin continues to increase.
Here are key steps to clean your data (using Excel)
- Get rid of extra spaces
- You want all of your rows and columns to align with each other. Often, you are faced with data that has extra spaces in some columns.
- To remove the extra spaces use the =TRIM() function. This will ensure that there is only a single space between each word
- Remove duplicates
- You can highlight all of the values that are duplicated through the conditional formatting function
- Highlight the row or column that you want to check for duplicates (or the whole spreadsheet)
- Conditional formatting > Highlight cell rules > Duplicate values
- You can highlight all of the values that are duplicated through the conditional formatting function
- Check for errors
- You can also use the conditional formatting function to highlight formula errors
- Conditional formatting > new rule
- The new formatting rule dialog box appears
- Format only cells that contain > errors > format (suggested: red) > OK
- You can also use the conditional formatting function to highlight formula errors
- Uniform text casing
- You can use syntax to ensure all of your text in a particular row, column or worksheet is the same
- LOWER() converts all text into lower case
- UPPER() converts all text into UPPER CASE
- PROPER() converts all text into Proper Case
- You can use syntax to ensure all of your text in a particular row, column or worksheet is the same
Tools to make data visualization easier
- Visme: Presenting your research project? This tool creates beautiful presentations with unique themes and animations. You can also present your tables, graphs, hot maps, etc. with a few clicks — the visualizations are modern and customizable
- Excel: Create data visualizations right from your data set in Excel with pivot tables and other graphics. This will help summarize and find correlations/relationships between your data.
- Tableau and Power BI: A couple steps up from Excel, these programs heavily focus on the visualization part of data. Create interactive dashboards and identify the patterns, changes, and density of your data through an array of graphs and charts that are easy for anyone to read.
- In fact, Dr. S is teaching ENGL 4496 next semester called Data Storytelling for Social Impact and it comes with a FREE Tableau certification exam! (normally $200)
- Infogram and Canva: Use these tools to create a nice one-pager, data report, or a beautiful and detailed infographic! These tools are easy to use and have plenty of templates to choose from. They don’t require as much brainpower as an analytical tool like Excel.