Statistical Analysis

Violin Plot in R – How to Draw It –

There are several ways to visualize data, and one of them is the violin plot! This method is useful for comparing multiple sets of data, and it has an appealing appearance.

First, you’ll need to prepare your data. While you’re at it, you can also check the data distribution using a dot plot.

Data Generation and Distribution Checking

Now, let’s create some data. If you have your own data, feel free to use that instead.

Code example
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Set working directory
setwd("~/Rpractice/")
# load tidyverse and ggbeeswarm
library(tidyverse)
library(ggbeeswarm)
 
# generate sample data
dat <- list(
  X <- rnorm(100, 5, 10),
  Y <- rnorm(100, 20, 10),
  Z <- rnorm(100, 15, 15)
)
 
# format the data into a more usable shape using pivot_longer
df <- data.frame(matrix(unlist(dat), nrow=100))
colnames(df) <- c("A","B","C")
df.long <- pivot_longer(df, cols = A:C, names_to = "Categories", values_to = "Values")
 
# draw dotplot
ggplot(df.long, aes(x = Categories, y = Values))+
  geom_beeswarm(aes(color = Categories),
                size = 2,
                cex = 2,
                alpha = .8)+
  theme_classic()+
  theme(legend.position = "none")

The dot plot displays the data distribution and can be used to confirm that the data is appropriate for creating a violin plot.

Draw a Simple Violin Plot

To draw a violin plot using ggplot2, you can utilize the geom_violin() function. To create a clean and simple plot, set the background color to white using the theme_classic() function.

Code example
1
2
3
ggplot(df.long, aes(x = Categories, y = Values))+
  geom_violin()+
  theme_classic()

The ends of the violin plot may appear cut off. By overlaying a dot plot on top of the violin plot, you can address this issue. To do so, you can use either geom_dotplot() or geom_beeswarm(), which are both part of the ggplot2 package.

Code example
1
2
3
4
5
6
7
ggplot(df.long, aes(x = Categories, y = Values))+
  geom_violin()+
  geom_beeswarm(aes(color = Categories),
                size = 2,
                cex = 2,
                alpha = .8)+
  theme_classic()

You can see that the ends of this violin plot are cut off at the minimum and maximum values. If you don’t want to cut off the ends, you can use geom_violin(trim = FALSE) to specify this preference.

Code example
1
2
3
4
5
6
7
ggplot(df.long, aes(x = Categories, y = Values))+
  geom_violin(trim = FALSE)+
  geom_beeswarm(aes(color = Categories),
                size = 2,
                cex = 2,
                alpha = .8)+
  theme_classic()

The violin plot may stretch vertically up and down, even in areas where there are no data points.

Add Color to Violin Plot

Add Color to the Borders

To add color to the border of the violin plot, you can use aes(color = ) inside the geom_violin() function.

Code example
1
2
3
ggplot(df.long, aes(x = Categories, y = Values, color = Categories))+
  geom_violin()+
  theme_classic()

Fill the interior of Violin Plot

If you want to fill the interior of the violin plot with color, you can use aes(fill =) inside the geom_violin() function.

Code example
1
2
3
ggplot(df.long, aes(x = Categories, y = Values, fill = Categories))+
  geom_violin()+
  theme_classic()

There are several ways to change the color of the violin plot. One way is to use the scale_fill_brewer() function to specify the color scheme.

Code example
1
2
3
4
ggplot(df.long, aes(x = Categories, y = Values, fill = Categories))+
  geom_violin()+
  scale_fill_brewer(palette = "Set2")+
  theme_classic()

Add Mean or Median to the Violin Plot

To add the mean or median to the violin plot, you can use the stat_summary() function.

Add Mean

Code example
1
2
3
4
5
ggplot(df.long, aes(x = Categories, y = Values, color = Categories))+
  geom_violin()+
  stat_summary(fun = mean, geom = "point",
               shape = 16, size = 2, color = "red")+
  theme_classic()

The shape parameter in stat_summary() is the same as pch in base R. Here’s a list of shapes that correspond to each numerical value:

Add Median

Code example
1
2
3
4
5
ggplot(df.long, aes(x = Categories, y = Values, color = Categories))+
  geom_violin()+
  stat_summary(fun = median, geom = "point",
               shape = 3, size = 2, color = "red")+
  theme_classic()

Change the Degree of Smoothing

To change the degree of smoothing in the violin plot, you can use the adjust parameter inside the geom_violin() function. The default value for adjust is 1.

To decrease the degree of smoothing, you can set adjust to a smaller value. For example, to set adjust to 0.2, you can use the following code:

Code example
1
2
3
ggplot(df.long, aes(x = Categories, y = Values, fill = Categories))+
  geom_violin(adjust = .2)+
  theme_classic()

Conversely, if you want to increase the degree of smoothing in the violin plot, you can set the adjust parameter to a larger value.

Code example
1
2
3
ggplot(df.long, aes(x = Categories, y = Values, fill = Categories))+
  geom_violin(adjust = 2)+
  theme_classic()

If you increase the degree of smoothing too much, the violin plot can become overly smoothed and lose important details.

Overlaying a Box Plot

Overlaying a violin plot with a box plot is a common technique in data visualization, and it can be a powerful way to display data.

Code example
1
2
3
4
ggplot(df.long, aes(x = Categories, y = Values, fill = Categories))+
  geom_violin()+
  geom_boxplot(width = .1, fill = "white")+
  theme_classic()

To hide the outliers in the box plot when overlaying it with a violin plot, you can use the outlier.color parameter inside the geom_boxplot() function.

Code example
1
2
3
4
ggplot(df.long, aes(x = Categories, y = Values, fill = Categories))+
  geom_violin()+
  geom_boxplot(width = .1, fill = "white", outlier.color = NA)+
  theme_classic()

You can fill the box plot with black color and add a white circle at the median value.

Code example
1
2
3
4
5
ggplot(df.long, aes(x = Categories, y = Values))+
  geom_violin()+
  geom_boxplot(width = .1, fill = "black", outlier.color = NA) +
  stat_summary(fun = median, geom = "point", fill = "white", shape = 21, size = 3) +
  theme_classic()

With the information provided, I believe you can now create a violin plot. I hope this guidance proves helpful!