Data visualization is represented through standard graphics, charts, plots, infographics, and animations. These visual displays of information communicate complex data relationships and data-driven insights in a way that is easy to understand.
Data visualization can be utilized for various purposes, and it’s important to note that it is not only reserved for data teams. Management also leverages it to convey organizational structure and hierarchy, while data analysts and data scientists use it to discover and explain patterns and trends. Harvard Business Review (link resides outside IBM) categorizes data visualization into four fundamental purposes: idea generation, illustration, visual discovery, and everyday DataViz. We’ll delve deeper into these below:
Idea generation
Data visualization is commonly used to spur idea generation across teams. They are frequently leveraged during brainstorming or Design Thinking sessions at the start of a project by supporting the collection of different perspectives and highlighting the common concerns of the collective. While these visualizations are usually unpolished and unrefined, they help set the foundation within the project to ensure that the team is aligned on the problem they’re looking to address for critical stakeholders.
Idea illustration
Data visualization for idea illustration assists in conveying an idea, such as a tactic or process. It is commonly used in learning settings, such as tutorials, certification courses, and centers of excellence. Still, it can also represent organization structures or processes, facilitating communication between the right individuals for specific tasks. Project managers frequently use Gantt charts and waterfall charts to illustrate workflows. Data modeling also uses abstraction to represent and better understand data flow within an enterprise’s information system, making it easier for developers, business analysts, data architects, and others to understand the relationships in a database or data warehouse.
Visual discovery
Visual discovery and everyday data viz are more closely aligned with data teams. While visual discovery helps data analysts, data scientists, and other data professionals identify patterns and trends within a dataset, everyday data viz supports the subsequent storytelling after a new insight has been found.
Data visualization
Data visualization is a critical step in the data science process, helping teams and individuals convey data more effectively to colleagues and decision-makers. Teams that manage reporting systems typically leverage defined template views to monitor performance. However, data visualization isn’t limited to performance dashboards. For example, while text mining, an analyst may use a word cloud to capture key concepts, trends, and hidden relationships within this unstructured data. Alternatively, they may utilize a graph structure to illustrate relationships between entities in a knowledge graph. There are several ways to represent different types of data, and it’s important to remember that it is a skill set that should extend beyond your core analytics team.
Types of data visualizations
The earliest form of data visualization can be traced back to the Egyptians in the pre-17th century, largely used to assist in navigation. As time progressed, people leveraged data visualizations for broader applications, such as in economic, social, and health disciplines. Perhaps most notably, Edward Tufte published The Visual Display of Quantitative Information (link resides outside IBM), illustrating that individuals could utilize data visualization to present data more effectively. His book continues to stand the test of time, especially as companies turn to dashboards to report their performance metrics in real time. Dashboards are practical data visualization tools for tracking and visualizing data from multiple data sources, providing visibility into the effects of specific behaviors by a team or an adjacent one on performance. Dashboards include standard visualization techniques, such as:
- Tables: This consists of rows and columns used to compare variables. Tables can show much information in a structured way, but they can also overwhelm users simply looking for high-level trends.
- Pie charts and stacked bar charts: These graphs are divided into sections that represent parts of a whole. They provide a simple way to organize data and compare the size of each component to one other.
- Line charts and area charts: These visuals show changes in one or more quantities by plotting a series of data points over time and are frequently used within predictive analytics. Line graphs utilize lines to demonstrate these changes, while area charts connect data points with line segments, stacking variables on top of one another and using color to distinguish between variables.
- Histograms: This graph plots a distribution of numbers using a bar chart (with no spaces between the bars), representing the quantity of data that falls within a particular range. This visual makes it easy for an end user to identify outliers within a given dataset.
- Scatter plots: These visuals are beneficial in revealing the relationship between two variables and are commonly used within regression data analysis. However, these can sometimes be confused with bubble charts, which visualize three variables via the x-axis, the y-axis, and the size of the bubble.
- Heat maps: This graphical representation displays help visualize behavioral data by location. This can be a location on a map or even a webpage.
- Treemaps, display hierarchical data as a set of nested shapes, typically rectangles. Treemaps are great for comparing the proportions between categories via their area size.
Open-source visualization tools
Access to data visualization tools has never been easier. Open-source libraries, such as D3.js, allow analysts to present data interactively and engage a broader audience with new data. Some of the most popular open-source visualization libraries include:
- D3.js is a front-end JavaScript library for producing dynamic, interactive data visualizations in web browsers. D3.js (link resides outside IBM) uses HTML, CSS, and SVG to create visual representations of data that can be viewed on any browser. It also provides features for interactions and animations.
- ECharts: A powerful charting and visualization library that offers an easy way to add intuitive, interactive, and highly customizable charts to products, research papers, presentations, etc. Echarts (link resides outside IBM) is based on JavaScript and ZRender, a lightweight canvas library.
- Vega: Vega (link resides outside IBM) defines itself as “visualization grammar,” providing support to customize visualizations across large datasets which are accessible from the web.
- deck.gl: It is part of Uber’s open-source visualization framework suite. deck.gl (link resides outside IBM) is a framework that is used for exploratory data analysis on big data. It helps build high-performance GPU-powered visualization on the web.
Data visualization best practices
With so many data visualization tools readily available, there has also been a rise in ineffective information visualization. Visual communication should be simple and deliberate to ensure that your data visualization helps your target audience arrive at your intended insight or conclusion. The following best practices can help ensure your data visualization is helpful and clear:
Set the context: It’s important to provide general background information to ground the audience around why this particular data point is important. For example, suppose open e-mail rates were underperforming. In that case, we may want to illustrate how a company’s open rate compares to the overall industry, demonstrating that the company has a problem within this marketing channel. To drive action, the audience needs to understand how current performance compares to something tangible, like a goal, benchmark, or other key performance indicators (KPIs).
Know your audience(s): Think about who your visualization is designed for, and then make sure your data visualization fits their needs. What is that person trying to accomplish? What kind of questions do they care about? Does your visualization address their concerns? You’ll want the data you provide to motivate people to act within the scope of their role. If you’re unsure if the visualization is clear, present it to one or two people within your target audience to get feedback, allowing you to make additional edits before a large presentation.
Choose an effective visual: Specific visuals are designed for specific types of datasets. For instance, scatter plots display the relationship between two variables well, while line graphs display time series data well. Ensure that the visual actually assists the audience in understanding your main takeaway. Misalignment of charts and data can result in the opposite, confusing your audience further versus providing clarity.
Please keep it simple: Data visualization tools can easily add all information to your visual. However, just because you can doesn’t mean you should! In data visualization, you want to be deliberate about the additional information you add to focus user attention. For example, do you need data labels on every bar in your bar chart? Perhaps you only need one or two to help illustrate your point. Do you need a variety of colors to communicate your idea? Are you using colors accessible to a wide range of audiences (e.g., accounting for color-blind audiences)? Design your data visualization for maximum impact by eliminating information that may distract your target audience.