Violin Plot
What is a violin plot?
A violin plot is a hybrid between a box plot and a mirrored density plot and is used to visualize the underlying distribution of a dataset. The probability density of the data is smoothed using a gaussian kernel, where wider sections of the violin plot indicate where more data points are concentrated.
Traditionally, a box plot is layered on top of the density plot with the "middle dot" representing the median value, the "box" displaying the interquartile range (IQR) and the "whiskers" (thin line) showing either the range, Tukey's fences, or percentiles.
When to use a violin plot
When combined with a distribution dot plot—"bar", "spread" and "box" plots can provide as rich an understanding of the underlying data distribution as violin plots including information about modality, skewness, and unlike violins can also show the number of observations.
Where violin plots excel, is working with larger datasets. When there is sufficient data points to reliably estimate the probability density and the sheer volume of data makes distribution dot plots appear visually cluttered and dense.
Chart properties
Prop | Default | Description |
---|---|---|
central tendency | median | median The middle most value of a sorted set of numbers. mean The sum of a set of values divided by the number of values in the set. |
whiskers | range | range The difference between the highest and lowest values within a set. median or mean 1.5 * Interquartile range (1.5*IQR) A range representing Q1 - 1.5 * IQR and Q3 + 1.5 * IQR. median 2.5 percentile - 97.5 percentile (2.5-97.5 %tile) The difference between the 2.5 percentile and the 97.5 percentile, representing the middle 95% of a set. median standard error of the mean (SEM) How much the sample means vary from the population mean. mean standard deviation (SD) A measure of the variation of a set of values around their mean. mean 95% confidence interval (95% CI) 95% probability that the population parameter lies within this range.mean |
sort | none | none The dataset is arranged in insertion order. ascending The dataset is arranged from smallest to largest value. descending The dataset is arranged from largest to smallest value. |