Home
/
Blog
/
Developer Insights
/
Data visualization for beginners - Part 1

Data visualization for beginners - Part 1

Author
Shubham Gupta
Calendar Icon
May 10, 2018
Timer Icon
3 min read
Share

This is a series of blogs dedicated to different data visualization techniques used in various domains of machine learning. Data Visualization is a critical step for building a powerful and efficient machine learning model. It helps us to better understand the data, generate better insights for feature engineering, and, finally, make better decisions during modeling and training of the model.

For this blog, we will use the seaborn and matplotlib libraries to generate the visualizations. Matplotlib is a MATLAB-like plotting framework in python, while seaborn is a python visualization library based on matplotlib. It provides a high-level interface for producing statistical graphics. In this blog, we will explore different statistical graphical techniques that can help us in effectively interpreting and understanding the data. Although all the plots using the seaborn library can be built using the matplotlib library, we usually prefer the seaborn library because of its ability to handle DataFrames.

We will start by importing the two libraries. Here is the guide to installing the matplotlib library and seaborn library. (Note that I’ll be using matplotlib and seaborn libraries interchangeably depending on the plot.)

### Importing necessary library  
import random  
import numpy as np  
import pandas as pd  
import seaborn as sns  
import matplotlib.pyplot as plt  
%matplotlib inline  

Simple Plot

Let’s begin by plotting a simple line plot which is used to plot a mathematical. A line plot is used to plot the relationship or dependence of one variable on another. Say, we have two variables ‘x’ and ‘y’ with the following values:

x = np.array([ 0, 0.53, 1.05, 1.58, 2.11, 2.63, 3.16, 3.68, 4.21,  
        4.74, 5.26, 5.79, 6.32, 6.84])  
y = np.array([ 0, 0.51, 0.87, 1. , 0.86, 0.49, -0.02, -0.51, -0.88,  
        -1. , -0.85, -0.47, 0.04, 0.53])  

To plot the relationship between the two variables, we can simply call the plot function.

### Creating a figure to plot the graph.  
fig, ax = plt.subplots()  
ax.plot(x, y)  
ax.set_xlabel('X data')  
ax.set_ylabel('Y data')  
ax.set_title('Relationship between variables X and Y')  
plt.show() # display the graph  
### if %matplotlib inline has been invoked already, then plt.show() is automatically invoked and the plot is displayed in the same window.  
Data Visualization Technique: Simple Plot - Relationship between X&Y
Fig. 1. Line Plot between X and Y

Here, we can see that the variables ‘x’ and ‘y’ have a sinusoidal relationship. Generally, .plot() function is used to find any mathematical relationship between the variables.

Histogram

Machine learning challenge, ML challenge

A histogram is one of the most frequently used data visualization techniques in machine learning. It represents the distribution of a continuous variable over a given interval or period of time. Histograms plot the data by dividing it into intervals called ‘bins’. It is used to inspect the underlying frequency distribution (eg. Normal distribution), outliers, skewness, etc.

Let’s assume some data ‘x’ and analyze its distribution and other related features.

### Let 'x' be the data with 1000 random points.   
x = np.random.randn(1000)  

Let’s plot a histogram to analyze the distribution of ‘x’.

plt.hist(x)  
plt.xlabel('Intervals')  
plt.ylabel('Value')  
plt.title('Distribution of the variable x')  
plt.show()  
Data Visualization Techniques: Histogram of variable x
Fig 2. Histogram showing the distribution of the variable ‘x’.

The above plot shows a normal distribution, i.e., the variable ‘x’ is normally distributed. We can also infer that the distribution is somewhat negatively skewed. We usually control the ‘bins’ parameters to produce a distribution with smooth boundaries. For example, if we set the number of ‘bins’ too low, say bins=5, then most of the values get accumulated in the same interval, and as a result they produce a distribution which is hard to predict.

plt.hist(x, bins=5)  
plt.xlabel('Intervals')  
plt.ylabel('Value')  
plt.title('Distribution of the variable x')  
plt.show()  
Data Visualization Techniques: Histogram with low number of bins
Fig 3. Histogram with low number of bins.

Similarly, if we increase the number of ‘bins’ to a high value, say bins=1000, each value will act as a separate bin, and as a result the distribution seems to be too random.

plt.hist(x, bins=1000)  
plt.xlabel('Intervals')  
plt.ylabel('Value')  
plt.title('Distribution of the variable x')  
plt.show()  
Data Visualization Techniques: Histogram with low bins
Fig. 4. Histogram with a large number of bins.

Kernel Density Function

Before we dive into understanding KDE, let’s understand what parametric and non-parametric data are.

Parametric Data: When the data is assumed to have been drawn from a particular distribution and some parametric test can be applied to it

Non-Parametric Data: When we have no knowledge about the population and the underlying distribution

Kernel Density Function is the non-parametric way of representing the probability distribution function of a random variable. It is used when the parametric distribution of the data doesn’t make much sense, and you want to avoid making assumptions about the data.

The kernel density estimator is the estimated pdf of a random variable. It is defined as
Kernel density equation
Similar to histograms, KDE plots the density of observations on one axis with height along the other axis.

### We will use the seaborn library to plot KDE.  
### Let's assume random data stored in variable 'x'.  
fig, ax = plt.subplots()  
### Generating random data  
x = np.random.rand(200)   
sns.kdeplot(x, shade=True, ax=ax)  
plt.show()  
Data visualization using Kernel Density Function
Fig 5. KDE plot for the random variable ‘x’.

Distplot combines the function of the histogram and the KDE plot into one figure.

### Generating a random sample  
x = np.random.random_sample(1000)  
### Plotting the distplot  
sns.distplot(x, bins=20)  
Data Visualization: Distplot using seaborn
Fig 6. Displot for the random variable ‘x’.

So, the distplot function plots the histogram and the KDE for the sample data in the same figure. You can tune the parameters of the displot to only display the histogram or kde or both. Distplot comes in handy when you want to visualize how close your assumption about the distribution of the data is to the actual distribution.

Scatter Plot

Scatter plots are used to determine the relationship between two variables. They show how much one variable is affected by another. It is the most commonly used data visualization technique and helps in drawing useful insights when comparing two variables. The relationship between two variables is called correlation. If the data points fit a line or curve with a positive slope, then the two variables are said to show positive correlation. If the line or curve has a negative slope, then the variables are said to have a negative correlation.

A perfect positive correlation has a value of 1 and a perfect negative correlation has a value of -1. The closer the value is to 1 or -1, the stronger the relationship between the variables. The closer the value is to 0, the weaker the correlation.

For our example, let’s define three variables ‘x’, ‘y’, and ‘z’, where ‘x’ and ‘z’ are randomly generated data and ‘y’ is defined as
EquationWe will use a scatter plot to find the relationship between the variables ‘x’ and ‘y’.

### Let's define the variables we want to find the relationship between.  
x = np.random.rand(500)  
z = np.random.rand(500)  
### Defining the variable 'y'  
y = x * (z + x)  
fig, ax = plt.subplots()  
ax.set_xlabel('X')  
ax.set_ylabel('Y')  
ax.set_title('Scatter plot between X and Y')  
plt.scatter(x, y, marker='.')  
plt.show()  
Data Visualization: Scatter plot between X & Y
Fig 7. Scatter plot between X and Y.

From the figure above we can see that the data points are very close to each other and also if we fit a curve, along with the points, it will have a positive slope. Therefore, we can infer that there is a strong positive correlation between the values of the variable ‘x’ and variable ‘y’.

Also, we can see that the curve that best fits the graph is quadratic in nature and this can be confirmed by looking at the definition of the variable ‘y’.

Joint Plot

Jointplot is seaborn library specific and can be used to quickly visualize and analyze the relationship between two variables and describe their individual distributions on the same plot.

Let’s start with using joint plot for producing the scatter plot.

### Defining the data.   
mean, covar = [0, 1], [[1, 0,], [0, 50]]  
### Drawing random samples from a multivariate normal distribution.  
### Two random variables are created, each containing 500 values, with the given mean and covariance.  
data = np.random.multivariate_normal(mean, covar, 500)  
### Storing the variables in a dataframe.  
df = pd.DataFrame(data=data, columns=['X', 'Y'])  
### Joint plot between X and Y  
sns.jointplot(df.X, df.Y, kind='scatter')  
plt.show()  
Data Visualisation: Joint plot using seaborn
Fig 8. Joint plot (scatter plot) between X and Y.

Next, we can use the joint point to find the best line or curve that fits the plot.

sns.jointplot(df.X, df.Y, kind='reg')  
plt.show()  
Data visualization: Using joint plot for regression
Fig 9. Using joint plot to plot the regression line that best fits the data points.

Apart from this, jointplot can also be used to plot ‘kde’, ‘hex plot’, and ‘residual plot’.

PairPlot

We can use scatter plot to plot the relationship between two variables. But what if the dataset has more than two variables (which is quite often the case), it can be a tedious task to visualize the relationship between each variable with the other variables.

The seaborn pairplot function does the same thing for us and in just one line of code. It is used to plot multiple pairwise bivariate (two variable) distribution in a dataset. It creates a matrix and plots the relationship for each pair of columns. It also draws a univariate distribution for each variable on the diagonal axes.

### Loading a dataset from the sklearn toy datasets  
from sklearn.datasets import load_linnerud  
### Loading the data  
linnerud_data = load_linnerud()  
### Extracting the column data  
data = linnerud_data.data  

Sklearn stores data in the form of a numpy array and not data frames, thereby storing the data in a dataframe.

### Creating a dataframe  
data = pd.DataFrame(data=data, columns=diabetes_data.feature_names)  
### Plotting a pairplot  
sns.pairplot(data=data)  
Data visualization: Pair plot for relation between columns
Fig 10. Pair plot showing the relationships between the columns of the dataset.

So, in the graph above, we can see the relationships between each of the variables with the other and thus infer which variables are most correlated.

Conclusion

Visualizations play an important role in data analysis and exploration. In this blog, we got introduced to different kinds of plots used for data analysis of continuous variables. Next week, we will explore the various data visualization techniques that can be applied to categorical variables or variables with discrete values. Next, I encourage you to download the iris dataset or any other dataset of your choice and apply and explore the techniques learned in this blog.

Have anything to say? Feel free to comment below for any questions, suggestions, and discussions related to this article. Till then, Sayōnara.

Subscribe to The HackerEarth Blog

Get expert tips, hacks, and how-tos from the world of tech recruiting to stay on top of your hiring!

Author
Shubham Gupta
Calendar Icon
May 10, 2018
Timer Icon
3 min read
Share

Hire top tech talent with our recruitment platform

Access Free Demo
Related reads

Discover more articles

Gain insights to optimize your developer recruitment process.

The Mobile Dev Hiring Landscape Just Changed

Revolutionizing Mobile Talent Hiring: The HackerEarth Advantage

The demand for mobile applications is exploding, but finding and verifying developers with proven, real-world skills is more difficult than ever. Traditional assessment methods often fall short, failing to replicate the complexities of modern mobile development.

Introducing a New Era in Mobile Assessment

At HackerEarth, we're closing this critical gap with two groundbreaking features, seamlessly integrated into our Full Stack IDE:

Article content

Now, assess mobile developers in their true native environment. Our enhanced Full Stack questions now offer full support for both Java and Kotlin, the core languages powering the Android ecosystem. This allows you to evaluate candidates on authentic, real-world app development skills, moving beyond theoretical knowledge to practical application.

Article content

Say goodbye to setup drama and tool-switching. Candidates can now build, test, and debug Android and React Native applications directly within the browser-based IDE. This seamless, in-browser experience provides a true-to-life evaluation, saving valuable time for both candidates and your hiring team.

Assess the Skills That Truly Matter

With native Android support, your assessments can now delve into a candidate's ability to write clean, efficient, and functional code in the languages professional developers use daily. Kotlin's rapid adoption makes proficiency in it a key indicator of a forward-thinking candidate ready for modern mobile development.

Breakup of Mobile development skills ~95% of mobile app dev happens through Java and Kotlin
This chart illustrates the importance of assessing proficiency in both modern (Kotlin) and established (Java) codebases.

Streamlining Your Assessment Workflow

The integrated mobile emulator fundamentally transforms the assessment process. By eliminating the friction of fragmented toolchains and complex local setups, we enable a faster, more effective evaluation and a superior candidate experience.

Old Fragmented Way vs. The New, Integrated Way
Visualize the stark difference: Our streamlined workflow removes technical hurdles, allowing candidates to focus purely on demonstrating their coding and problem-solving abilities.

Quantifiable Impact on Hiring Success

A seamless and authentic assessment environment isn't just a convenience, it's a powerful catalyst for efficiency and better hiring outcomes. By removing technical barriers, candidates can focus entirely on demonstrating their skills, leading to faster submissions and higher-quality signals for your recruiters and hiring managers.

A Better Experience for Everyone

Our new features are meticulously designed to benefit the entire hiring ecosystem:

For Recruiters & Hiring Managers:

  • Accurately assess real-world development skills.
  • Gain deeper insights into candidate proficiency.
  • Hire with greater confidence and speed.
  • Reduce candidate drop-off from technical friction.

For Candidates:

  • Enjoy a seamless, efficient assessment experience.
  • No need to switch between different tools or manage complex setups.
  • Focus purely on showcasing skills, not environment configurations.
  • Work in a powerful, professional-grade IDE.

Unlock a New Era of Mobile Talent Assessment

Stop guessing and start hiring the best mobile developers with confidence. Explore how HackerEarth can transform your tech recruiting.

Vibe Coding: Shaping the Future of Software

A New Era of Code

Vibe coding is a new method of using natural language prompts and AI tools to generate code. I have seen firsthand that this change makes software more accessible to everyone. In the past, being able to produce functional code was a strong advantage for developers. Today, when code is produced quickly through AI, the true value lies in designing, refining, and optimizing systems. Our role now goes beyond writing code; we must also ensure that our systems remain efficient and reliable.

From Machine Language to Natural Language

I recall the early days when every line of code was written manually. We progressed from machine language to high-level programming, and now we are beginning to interact with our tools using natural language. This development does not only increase speed but also changes how we approach problem solving. Product managers can now create working demos in hours instead of weeks, and founders have a clearer way of pitching their ideas with functional prototypes. It is important for us to rethink our role as developers and focus on architecture and system design rather than simply on typing c

The Promise and the Pitfalls

I have experienced both sides of vibe coding. In cases where the goal was to build a quick prototype or a simple internal tool, AI-generated code provided impressive results. Teams have been able to test new ideas and validate concepts much faster. However, when it comes to more complex systems that require careful planning and attention to detail, the output from AI can be problematic. I have seen situations where AI produces large volumes of code that become difficult to manage without significant human intervention.

AI-powered coding tools like GitHub Copilot and AWS’s Q Developer have demonstrated significant productivity gains. For instance, at the National Australia Bank, it’s reported that half of the production code is generated by Q Developer, allowing developers to focus on higher-level problem-solving . Similarly, platforms like Lovable enable non-coders to build viable tech businesses using natural language prompts, contributing to a shift where AI-generated code reduces the need for large engineering teams. However, there are challenges. AI-generated code can sometimes be verbose or lack the architectural discipline required for complex systems. While AI can rapidly produce prototypes or simple utilities, building large-scale systems still necessitates experienced engineers to refine and optimize the code.​

The Economic Impact

The democratization of code generation is altering the economic landscape of software development. As AI tools become more prevalent, the value of average coding skills may diminish, potentially affecting salaries for entry-level positions. Conversely, developers who excel in system design, architecture, and optimization are likely to see increased demand and compensation.​
Seizing the Opportunity

Vibe coding is most beneficial in areas such as rapid prototyping and building simple applications or internal tools. It frees up valuable time that we can then invest in higher-level tasks such as system architecture, security, and user experience. When used in the right context, AI becomes a helpful partner that accelerates the development process without replacing the need for skilled engineers.

This is revolutionizing our craft, much like the shift from machine language to assembly to high-level languages did in the past. AI can churn out code at lightning speed, but remember, “Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” Use AI for rapid prototyping, but it’s your expertise that transforms raw output into robust, scalable software. By honing our skills in design and architecture, we ensure our work remains impactful and enduring. Let’s continue to learn, adapt, and build software that stands the test of time.​

Ready to streamline your recruitment process? Get a free demo to explore cutting-edge solutions and resources for your hiring needs.

Guide to Conducting Successful System Design Interviews in 2025

What is Systems Design?

Systems Design is an all encompassing term which encapsulates both frontend and backend components harmonized to define the overall architecture of a product.

Designing robust and scalable systems requires a deep understanding of application, architecture and their underlying components like networks, data, interfaces and modules.

Systems Design, in its essence, is a blueprint of how software and applications should work to meet specific goals. The multi-dimensional nature of this discipline makes it open-ended – as there is no single one-size-fits-all solution to a system design problem.

What is a System Design Interview?

Conducting a System Design interview requires recruiters to take an unconventional approach and look beyond right or wrong answers. Recruiters should aim for evaluating a candidate’s ‘systemic thinking’ skills across three key aspects:

How they navigate technical complexity and navigate uncertainty
How they meet expectations of scale, security and speed
How they focus on the bigger picture without losing sight of details

This assessment of the end-to-end thought process and a holistic approach to problem-solving is what the interview should focus on.

What are some common topics for a System Design Interview

System design interview questions are free-form and exploratory in nature where there is no right or best answer to a specific problem statement. Here are some common questions:

How would you approach the design of a social media app or video app?

What are some ways to design a search engine or a ticketing system?

How would you design an API for a payment gateway?

What are some trade-offs and constraints you will consider while designing systems?

What is your rationale for taking a particular approach to problem solving?

Usually, interviewers base the questions depending on the organization, its goals, key competitors and a candidate’s experience level.

For senior roles, the questions tend to focus on assessing the computational thinking, decision making and reasoning ability of a candidate. For entry level job interviews, the questions are designed to test the hard skills required for building a system architecture.

The Difference between a System Design Interview and a Coding Interview

If a coding interview is like a map that takes you from point A to Z – a systems design interview is like a compass which gives you a sense of the right direction.

Here are three key difference between the two:

Coding challenges follow a linear interviewing experience i.e. candidates are given a problem and interaction with recruiters is limited. System design interviews are more lateral and conversational, requiring active participation from interviewers.

Coding interviews or challenges focus on evaluating the technical acumen of a candidate whereas systems design interviews are oriented to assess problem solving and interpersonal skills.

Coding interviews are based on a right/wrong approach with ideal answers to problem statements while a systems design interview focuses on assessing the thought process and the ability to reason from first principles.

How to Conduct an Effective System Design Interview

One common mistake recruiters make is that they approach a system design interview with the expectations and preparation of a typical coding interview.
Here is a four step framework technical recruiters can follow to ensure a seamless and productive interview experience:

Step 1: Understand the subject at hand

  • Develop an understanding of basics of system design and architecture
  • Familiarize yourself with commonly asked systems design interview questions
  • Read about system design case studies for popular applications
  • Structure the questions and problems by increasing magnitude of difficulty

Step 2: Prepare for the interview

  • Plan the extent of the topics and scope of discussion in advance
  • Clearly define the evaluation criteria and communicate expectations
  • Quantify constraints, inputs, boundaries and assumptions
  • Establish the broader context and a detailed scope of the exercise

Step 3: Stay actively involved

  • Ask follow-up questions to challenge a solution
  • Probe candidates to gauge real-time logical reasoning skills
  • Make it a conversation and take notes of important pointers and outcomes
  • Guide candidates with hints and suggestions to steer them in the right direction

Step 4: Be a collaborator

  • Encourage candidates to explore and consider alternative solutions
  • Work with the candidate to drill the problem into smaller tasks
  • Provide context and supporting details to help candidates stay on track
  • Ask follow-up questions to learn about the candidate’s experience

Technical recruiters and hiring managers should aim for providing an environment of positive reinforcement, actionable feedback and encouragement to candidates.

Evaluation Rubric for Candidates

Facilitate Successful System Design Interview Experiences with FaceCode

FaceCode, HackerEarth’s intuitive and secure platform, empowers recruiters to conduct system design interviews in a live coding environment with HD video chat.

FaceCode comes with an interactive diagram board which makes it easier for interviewers to assess the design thinking skills and conduct communication assessments using a built-in library of diagram based questions.

With FaceCode, you can combine your feedback points with AI-powered insights to generate accurate, data-driven assessment reports in a breeze. Plus, you can access interview recordings and transcripts anytime to recall and trace back the interview experience.

Learn how FaceCode can help you conduct system design interviews and boost your hiring efficiency.

Top Products

Explore HackerEarth’s top products for Hiring & Innovation

Discover powerful tools designed to streamline hiring, assess talent efficiently, and run seamless hackathons. Explore HackerEarth’s top products that help businesses innovate and grow.
Frame
Hackathons
Engage global developers through innovation
Arrow
Frame 2
Assessments
AI-driven advanced coding assessments
Arrow
Frame 3
FaceCode
Real-time code editor for effective coding interviews
Arrow
Frame 4
L & D
Tailored learning paths for continuous assessments
Arrow
Get A Free Demo