How to Create & Interpret a Q-Q Plot in R (2024)

AQ-Q plot, short for “quantile-quantile” plot, is a type of plot that we can use to determine whether or not a set of data potentially came from some theoretical distribution.

Many statistical tests make the assumption that a set of data follows a normal distribution, and a Q-Q plot is often used to assess whether or not this assumption is met.

Although a Q-Q plot isn’t a formal statistical test, it does provide an easy way to visually check whether a dataset follows a normal distribution, and if not, how this assumption is violated and which data points potentially cause this violation.

We can create a Q-Q plot by plotting two sets of quantiles against one another. If both sets of quantiles came from the same distribution, then the points on the plot should roughly form a straight diagonal line.

How to Create & Interpret a Q-Q Plot in R (1)

Quantilesrepresent points in a dataset below which a certain portion of the data fall. For example, the 0.9 quantile represents the point below which 90% of the data fall below. The 0.5 quantile represents the point below which 50% of the data fall below, and so on.

Q-Q plots identify the quantiles in your sample data and plot them against the quantiles of a theoretical distribution. In most cases the normal distribution is used, but a Q-Q plot can actually be created for any theoretical distribution.

If the data points fall along a straight diagonal line in a Q-Q plot, then the dataset likely follows a normal distribution.

How to Create a Q-Q Plot in R

We can easily create a Q-Q plot to check if a dataset follows a normal distribution by using the built-inqqnorm()function.

For example, the following code generates a vector of 100 random values that follow a normal distribution and creates a Q-Q plot for this dataset to verify that it does indeed follow a normal distribution:

#make this example reproducibleset.seed(11)#generate vector of 100 values that follows a normal distributiondata <- rnorm(100)#create Q-Q plot to compare this dataset to a theoretical normal distributionqqnorm(data)

How to Create & Interpret a Q-Q Plot in R (2)

To make it even easier to see if the data falls along a straight line, we can use theqqline()function:

#create Q-Q plotqqnorm(data)#add straight diagonal line to plotqqline(data)

How to Create & Interpret a Q-Q Plot in R (3)

We can see that the data points near the tails don’t fall exactly along the straight line, but for the most part this sample data appears to be normally distributed (as it should be since we told R to generate the data from a normal distribution).

Consider instead the following code that generates a vector of 100 random values that follow a gamma distribution and creates a Q-Q plot for this data to check if it follows a normal distribution:

#make this example reproducibleset.seed(11)#generate vector of 100 values that follows a gamma distributiondata <- rgamma(100, 1)#create Q-Q plot to compare this dataset to a theoretical normal distributionqqnorm(data)qqline(data)

How to Create & Interpret a Q-Q Plot in R (4)

We can see the clear departure from the straight line in this Q-Q plot, indicating that this dataset likely does not follow a normal distribution.

Consider another chunk of code that generates a vector of 100 random values that follow a Chi-Square distribution with 5 degrees of freedom and creates a Q-Q plot for this data to check if it follows a normal distribution:

#make this example reproducibleset.seed(11)#generate vector of 100 values that follows a Chi-Square distributiondata <- rchisq(100, 5)#create Q-Q plot to compare this dataset to a theoretical normal distributionqqnorm(data)qqline(data)

How to Create & Interpret a Q-Q Plot in R (5)

Once again we can see that this dataset does not appear to follow a normal distribution, especially near the tails.

Modifying the Aesthetics of a Q-Q Plot in R

We can modify some of the aesthetics of the Q-Q plot in R including the title, axis labels, data point colors, line color, and line width.

The following code modifies the titles, axis labels, and color of the points in the plot:

#make this example reproducibleset.seed(11)#generate vector of 100 values that follows a normal distributiondata <- rnorm(100)#create Q-Q plotqqnorm(data, main = 'Q-Q Plot for Normality', xlab = 'Theoretical Dist', ylab = 'Sample dist', col = 'steelblue')

How to Create & Interpret a Q-Q Plot in R (6)

Next, the following code adds a straight diagonal line to the plot with a color of red, a line width of 2 (lwd = 2, default is 1), and a dashed line (lty = 2, default is 1):

qqline(data, col = 'red', lwd = 2, lty = 2)

How to Create & Interpret a Q-Q Plot in R (7)

Technical Notes

Keep in mind that a Q-Q plot is simply a way tovisuallycheck if a dataset follows a theoretical distribution. To formally test whether or not a dataset follows a particular distribution, the following tests can be performed (assuming you’re comparing your dataset to a normal distribution):

Anderson-Darling Test
Shapiro-Wilk Test
Kolmogorov-Smirnov Test

How to Create & Interpret a Q-Q Plot in R (2024)

FAQs

How to Create & Interpret a Q-Q Plot in R? ›

In R, there are two functions to create QQ plots: qqnorm() and qqplot() . qqnorm() creates a normal QQ plot. You give it a vector of data, and R plots the data in sorted order versus quantiles from a standard normal distribution. For example, consider the trees data set that comes with R.

How to interpret Q-Q plot in R? ›

On the horizontal axis, it shows the expected value of an individual with the same quantile if the distribution were normal (“theoretical quantiles” in the same figure). The QQ plot should follow more or less along a straight line if the data come from a normal distribution (with some tolerance for sampling variation).

How to draw the Q-Q plot? ›

Plotting:
  1. Plot the sorted dataset values on the x-axis.
  2. Plot the corresponding theoretical quantiles on the y-axis.
  3. Each data point (x, y) represents a pair of observed and expected values.
  4. Connect the data points to visually inspect the relationship between the dataset and the theoretical distribution.
Feb 11, 2024

How do you interpret a Q-Q plot in a linear regression model? ›

If the two distributions being compared are similar, the points in the Q–Q plot will approximately lie on the identity line y = x. If the distributions are linearly related, the points in the Q–Q plot will approximately lie on a line, but not necessarily on the line y = x.

What is the difference between a quantile plot and a Q-Q plot? ›

A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set. By a quantile, we mean the fraction (or percent) of points below the given value. That is, the 0.3 (or 30%) quantile is the point at which 30% percent of the data fall below and 70% fall above that value.

How to create a Q-Q plot in R? ›

In R, there are two functions to create QQ plots: qqnorm() and qqplot() . qqnorm() creates a normal QQ plot. You give it a vector of data, and R plots the data in sorted order versus quantiles from a standard normal distribution. For example, consider the trees data set that comes with R.

How to explain Q-Q plot? ›

Technically speaking, a Q-Q plot compares the distribution of two sets of data. In most cases, a probability plot will be most useful. A probability plot compares the distribution of a data set with a theoretical distribution. The R function qqnorm( ) compares a data set with the theoretical normal distibution.

What should a Q-Q plot of residuals look like? ›

Normal Q-Q Plot: This is used to assess if your residuals are normally distributed. basically what you are looking for here is the data points closely following the straight line at a 45% angle upwards (left to right).

What is the Z score in Q-Q plot? ›

When the option Q-Q plot is selected, the horizontal axis shows the z-scores of the observed values, z=(x−mean)/SD. A straight reference line represents the Normal distribution. If the sample data are near a Normal distribution, the data points will be near this straight line.

What does the slope of a Q-Q plot mean? ›

The slope of the Q-Q plot reflects the ratio of the standard deviation of your data to the standard deviation of the normal distribution. If the slope is greater than 1, it means that your data are more spread out than the normal distribution, and vice versa.

What does a good Q-Q plot look like? ›

If the two distributions that we are comparing are exactly equal, then the points on the Q-Q plot will perfectly lie on a straight line y = x. A Q-Q plot tells us whether a data set is normally distributed.

Why is Q-Q plot better than histogram? ›

For several reasons, it's easier to use a QQ plot than a histogram to see if your data follow a distribution. For starters, you can more accurately determine whether dots follow a line than seeing if histogram bars fit a curve. Additionally, a histogram's appearance depends on the sample size and the number of bars.

What are the disadvantages of Q-Q plot? ›

Quantile-Quantile (Q-Q) plots are often difficult to interpret because it is unclear how large the deviation from the theoretical distribution must be to indicate a lack of fit.

What is the Z score of the Q-Q plot? ›

When the option Q-Q plot is selected, the horizontal axis shows the z-scores of the observed values, z=(x−mean)/SD. A straight reference line represents the Normal distribution. If the sample data are near a Normal distribution, the data points will be near this straight line.

How do you know if a Q-Q plot is right skewed? ›

Put another way, it is left-skewed, also called negatively skewed. When we see the upper end of the Q-Q plot deviate from a straight line while the lower follows one, then the curve has a longer tail to its right and it is right-skewed, also called positively skewed.

How do you interpret P-P plot and Q-Q plot? ›

A P-P plot compares the empirical cumulative distribution function of a data set with a specified theoretical cumulative distribution function F(·). A Q-Q plot compares the quantiles of a data distribution with the quantiles of a standardized theoretical distribution from a specified family of distributions.

What does a heavy tailed Q-Q plot mean? ›

Heavy tailed qqplot: meaning that compared to the normal distribution there is much more data located at the extremes of the distribution and less data in the center of the distribution.

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Maia Crooks Jr

Last Updated:

Views: 6391

Rating: 4.2 / 5 (63 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Maia Crooks Jr

Birthday: 1997-09-21

Address: 93119 Joseph Street, Peggyfurt, NC 11582

Phone: +2983088926881

Job: Principal Design Liaison

Hobby: Web surfing, Skiing, role-playing games, Sketching, Polo, Sewing, Genealogy

Introduction: My name is Maia Crooks Jr, I am a homely, joyous, shiny, successful, hilarious, thoughtful, joyous person who loves writing and wants to share my knowledge and understanding with you.