<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Statistical Analysis｜ドクターフント(Dr. Hund)</title>
	<atom:link href="https://brain-storm.space/category/statistical-analysis/feed/" rel="self" type="application/rss+xml" />
	<link>https://brain-storm.space</link>
	<description>脳や研究について発信するブログです。This site is for research and statistics.</description>
	<lastBuildDate>Fri, 24 Mar 2023 17:13:38 +0000</lastBuildDate>
	<language>ja</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.1.1</generator>

<image>
	<url>https://brain-storm.space/wp-content/uploads/2021/04/cropped-3d0209af428738b78799159b4ce75ad9-32x32.png</url>
	<title>Statistical Analysis｜ドクターフント(Dr. Hund)</title>
	<link>https://brain-storm.space</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>[R beginners] Drawing ROC curve</title>
		<link>https://brain-storm.space/roc-curve-en/1304/</link>
					<comments>https://brain-storm.space/roc-curve-en/1304/#respond</comments>
		
		<dc:creator><![CDATA[brainblog]]></dc:creator>
		<pubDate>Sat, 25 Mar 2023 07:00:00 +0000</pubDate>
				<category><![CDATA[Statistical Analysis]]></category>
		<category><![CDATA[pROC]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[ROC]]></category>
		<category><![CDATA[sensitivity]]></category>
		<category><![CDATA[specificity]]></category>
		<guid isPermaLink="false">https://brain-storm.space/?p=1304</guid>

					<description><![CDATA[I will introduce how to draw an ROC curve. In R, it is very easy to draw an ROC curve. There are many librarie]]></description>
										<content:encoded><![CDATA[
<p>I will introduce how to draw an ROC curve. In R, it is very easy to draw an ROC curve. There are many libraries available for drawing ROC curves, but this time I will introduce how to draw it using the pROC library, which is commonly used and easy to use.</p>



<h3>Creating Sample Data</h3>



<p>This section is about creating sample data. If you don&#8217;t need it, please skip to the next section. Let&#8217;s consider how well we can detect people with the disease using two assays, &#8220;assay1&#8221; and &#8220;assay2&#8221;.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
set.seed(3)
condition &lt;- rep(c(&quot;healthy&quot;, &quot;disease&quot;), each = 50)
assay1 &lt;- c(rnorm(40, mean = 1, sd = 2), 
            rnorm(10, mean = 3, sd = 1), 
            rnorm(40, mean = 5, sd = 2), 
            rnorm(10, mean = 7, sd = 1))
assay2 &lt;- c(rnorm(30, mean = 1, sd = 3), 
            rnorm(20, mean = 2, sd = 2), 
            rnorm(30, mean = 3, sd = 3), 
            rnorm(20, mean = 4, sd = 2))

df = data.frame(condition, assay1, assay2)
</pre></div>


<p>these distributions would look like this:</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
library(beeswarm)
beeswarm(data = df, assay1 ~ condition, 
         col = c(&quot;red&quot;,&quot;blue&quot;), pch=19)
beeswarm(data = df, assay2 ~ condition,
         col = c(&quot;red&quot;, &quot;blue&quot;), pch=19)
</pre></div>


<figure class="wp-block-image size-full is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2023/03/dotplot_assay1.jpeg" alt="" class="wp-image-1220" width="400" height="400"/><figcaption class="wp-element-caption">Assay1</figcaption></figure>



<figure class="wp-block-image size-full is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2023/03/dotplot_assay2.jpeg" alt="" class="wp-image-1221" width="400" height="400"/><figcaption class="wp-element-caption">Assay2</figcaption></figure>



<h2>Use pROC</h2>



<h3>Install pROC package</h3>



<p>First, you need to install and load the pROC package to use it in R.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
install.packages(&quot;pROC&quot;)
library(pROC)
</pre></div>


<h3>To draw a single ROC curve</h3>



<p>The data frame looks like this:</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
&gt; head(df)
  condition     assay1     assay2
1   healthy -0.9238668  2.8520501
2   healthy  0.4149486 -0.2152325
3   healthy  1.5175764  4.1593113
4   healthy -1.3042638  2.8068527
5   healthy  1.3915657  4.0523835
6   healthy  1.0602479  2.8245020
</pre></div>


<p>There are several ways to proceed from here, but first let&#8217;s use roc() to store the result in an object.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
roc1 &lt;- roc(condition, assay1, data=df, ci=TRUE,
            levels=c(&quot;healthy&quot;, &quot;disease&quot;))
</pre></div>


<p>As an alternative, you can also use the following notation with &#8220;~&#8221;:</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
roc1 &lt;- roc(df$condition ~ df$assay1, ci=TRUE, 
            levels=c(&quot;healthy&quot;, &quot;disease&quot;))
</pre></div>


<p>The <code>condition</code> column can be 0 and 1, as it is a binary variable. <code>ci</code> refers to the confidence interval.</p>



<p>you can use the <code>plot()</code> function to draw the ROC curve.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
plot(roc1)
</pre></div>


<figure class="wp-block-image size-full"><img decoding="async" width="400" height="400" src="https://brain-storm.space/wp-content/uploads/2023/03/roc_1.jpeg" alt="" class="wp-image-1224"/><figcaption class="wp-element-caption">Assay1</figcaption></figure>



<h3>Convert to Percentage Display</h3>



<p>You can customize the axis to display in percentage format by adding <code>percent=TRUE</code>.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
roc1 &lt;- roc(condition, assay1, data=df, percent=TRUE,
            levels=c(&quot;healthy&quot;, &quot;disease&quot;))
plot(roc1)
</pre></div>


<figure class="wp-block-image size-full"><img decoding="async" width="400" height="400" src="https://brain-storm.space/wp-content/uploads/2023/03/roc_plot2_percent.jpeg" alt="" class="wp-image-1225"/></figure>



<h3>Display Confidence Interval of Sensitivity</h3>



<p>You can display the confidence interval of the sensitivity. To calculate the confidence interval of sensitivity, use <code>ci.se()</code>. <code>col</code> is used to specify the color.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
roc2 &lt;- roc(condition, assay2, data=df, ci=TRUE,
            levels=c(&quot;healthy&quot;, &quot;disease&quot;))
plot(roc2)
rocCI &lt;- ci.se(roc2)
plot(rocCI, type=&quot;shape&quot;, col=&quot;lightblue&quot;)
</pre></div>


<figure class="wp-block-image size-full"><img decoding="async" width="400" height="400" src="https://brain-storm.space/wp-content/uploads/2023/03/roc_ci_3.jpeg" alt="" class="wp-image-1227"/></figure>



<h2>Display Optimal Cutoff Point</h2>



<p>One important aspect of the ROC curve is determining the optimal cut-off point. While determining the optimal cut-off point is a big topic on its own, here we introduce two methods: the Youden Index and top left.</p>



<p>To explain briefly, the Youden Index is calculated as &#8220;sensitivity + specificity &#8211; 1&#8221;, and the point with the highest value is selected as the optimal cut-off point. On the other hand, the top left method extracts the point closest to the upper left corner of the ROC curve.</p>



<h3>display Optimal Cutoff based on Youden Index</h3>



<p>You can add the optimal cutoff to the plot by modifying the <code>plot()</code> function call. Setting <code>print.thres = "best"</code> and <code>print.thres.best.method = "youden"</code> will display the optimal point. Setting <code>legacy.axes = TRUE</code> will make the x-axis display as 1-Specificity.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
plot(roc1, main = &quot;ROC Curve&quot;,      
     identity = TRUE,
     print.thres = &quot;best&quot;,
     print.thres.best.method = &quot;youden&quot;,
     legacy.axes = TRUE)
</pre></div>


<figure class="wp-block-image size-full"><img decoding="async" width="400" height="400" src="https://brain-storm.space/wp-content/uploads/2023/03/roc_youden_5.jpeg" alt="" class="wp-image-1231"/><figcaption class="wp-element-caption">optimal cutoff using the Youden Index (J = Sensitivity + Specificity &#8211; 1)</figcaption></figure>



<p>The optimal cut-off point in this case is 3.912 with sensitivity of 0.820 and specificity of 0.960.</p>



<h3>Display Optimal Cutoff using Top Left Method</h3>



<p>In the case of top left, set <code>print.thres.best.method="closest.topleft"</code> in <code>plot()</code> function.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
plot(roc1, main = &quot;ROC Curve&quot;,      
     identity = TRUE,
     print.thres = &quot;best&quot;,
     print.thres.best.method = &quot;closest.topleft&quot;
     legacy.axes = TRUE)

</pre></div>


<figure class="wp-block-image size-full"><img decoding="async" width="400" height="400" src="https://brain-storm.space/wp-content/uploads/2023/03/roc_topleft_4.jpeg" alt="" class="wp-image-1234"/><figcaption class="wp-element-caption">The optimal cutoff based on the top left approach</figcaption></figure>



<p>Note that in this case, the optimal cutoff is 3.365, with a sensitivity of 0.860 and specificity of 0.880.</p>



<h2>Extracting Necessary Values</h2>



<p>When presenting or publishing the values of AUC (area under the curve), sensitivity, specificity, etc., you can display them as follows.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
&gt; auc(roc1)
Area under the curve: 0.9536

# youden
&gt; coords(roc1, &quot;best&quot;, ret=c(&quot;threshold&quot;, &quot;sens&quot;, &quot;spec&quot;, &quot;ppv&quot;, &quot;npv&quot;))
          threshold sensitivity specificity       ppv       npv
threshold  3.912407        0.82        0.96 0.9534884 0.8421053

# topleft
&gt; coords(roc1, &quot;best&quot;, best.method=&quot;closest.topleft&quot;,　ret=c(&quot;threshold&quot;, &quot;sens&quot;, &quot;spec&quot;, &quot;ppv&quot;, &quot;npv&quot;))
          threshold sensitivity specificity      ppv       npv
threshold  3.364778        0.86        0.88 0.877551 0.8627451
</pre></div>


<h2>Comparing two ROC curves by overlapping them</h2>



<p>We have created sample data assuming two tests. Let&#8217;s see which test is better by comparing the two ROC curves.</p>



<h3>Overlay Two ROC Curves</h3>



<p>When overlapping two ROC curves, you can use the <code>lines()</code> function for the one that you want to add later, or simply use <code>plot(..., add=TRUE)</code> for the second curve. You can specify the color using <code>col</code>.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
# Use roc() to create objects for the two assays
roc1 &lt;- roc(condition, assay1, data=df, ci=TRUE,
            levels=c(&quot;healthy&quot;, &quot;disease&quot;))
roc2 &lt;- roc(condition, assay2, data=df, ci=TRUE, 
            levels=c(&quot;healthy&quot;, &quot;disease&quot;))

# Use lines()
obj1 &lt;- plot(roc1,
             col=&quot;red&quot;)
obj2 &lt;- lines(roc2,
              col=&quot;blue&quot;)

# Use add = TRUE
obj1 &lt;- plot(roc1,
             col=&quot;red&quot;)
obj2 &lt;- plot(roc2,
             col=&quot;blue&quot;,
             add=TRUE)
</pre></div>


<figure class="wp-block-image size-full"><img decoding="async" width="400" height="400" src="https://brain-storm.space/wp-content/uploads/2023/03/roc_two_assay_6.jpeg" alt="" class="wp-image-1239"/></figure>



<h3>Compare Two Assays</h3>



<p>o compare two assays, use <code>roc.test()</code>. By default, it compares the AUCs (area under the curve) using the DeLong method, but you can change it to the bootstrap method by specifying <code>method = "bootstrap"</code>. The p-value is displayed in the center using text(). You can also specify the title using <code>main=</code>.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
roc1 &lt;- roc(condition, assay1, data=df, ci=TRUE,
            levels=c(&quot;healthy&quot;, &quot;disease&quot;))
roc2 &lt;- roc(condition, assay2, data=df, ci=TRUE, 
            levels=c(&quot;healthy&quot;, &quot;disease&quot;))

obj1 &lt;- plot(roc1,
             main=&quot;Comparison&quot;,
             col=&quot;red&quot;)
obj2 &lt;- lines(roc2,
              col=&quot;blue&quot;)
obj &lt;- roc.test(obj1, obj2)
text(.5, .5, labels=paste(&quot;p-value =&quot;, format.pval(obj$p.value, 3)), 
     adj=c(0, .5))
</pre></div>


<figure class="wp-block-image size-full"><img decoding="async" width="400" height="400" src="https://brain-storm.space/wp-content/uploads/2023/03/roc_comparison_7.jpeg" alt="" class="wp-image-1242"/></figure>



<p>If you just want to see the comparison results, <code>roc.test()</code> will provide you with the results.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
&gt; roc.test(roc1, roc2)

	DeLong's test for two correlated ROC curves

data:  roc1 and roc2
Z = 3.7621, p-value = 0.0001685
alternative hypothesis: true difference in AUC is not equal to 0
95 percent confidence interval:
 0.0969536 0.3078464
sample estimates:
AUC of roc1 AUC of roc2 
     0.9536      0.7512 
</pre></div>


<p>Finally, let&#8217;s add a legend.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
roc1 &lt;- roc(condition, assay1, data=df, ci=TRUE,
            levels=c(&quot;healthy&quot;, &quot;disease&quot;))
roc2 &lt;- roc(condition, assay2, data=df, ci=TRUE, 
            levels=c(&quot;healthy&quot;, &quot;disease&quot;))

obj1 &lt;- plot(roc1,
             main=&quot;Comparison&quot;,
             col=&quot;red&quot;)
obj2 &lt;- lines(roc2,
              col=&quot;blue&quot;)

obj &lt;- roc.test(obj1, obj2)

# p values in the graph
text(.5, .5, labels=paste(&quot;p-value =&quot;, format.pval(obj$p.value, 3)), 
     adj=c(0, .5))

#legend
legend(&quot;bottomright&quot;, legend=c(&quot;Assay1&quot;, &quot;Assay2&quot;),
       col=c(&quot;red&quot;, &quot;blue&quot;), lty=1, lwd=2)
</pre></div>


<figure class="wp-block-image size-full"><img decoding="async" width="400" height="400" src="https://brain-storm.space/wp-content/uploads/2023/03/roc_comparison_final_8-1.jpeg" alt="" class="wp-image-1251"/></figure>



<p>That was a helpful tutorial on how to draw ROC curves in R using the pROC package. Thank you for sharing!</p>
]]></content:encoded>
					
					<wfw:commentRss>https://brain-storm.space/roc-curve-en/1304/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[R for beginners] Creating Error Bars for Bar Graphs and Line Graphs in ggplot2</title>
		<link>https://brain-storm.space/error-bar-en/1291/</link>
		
		<dc:creator><![CDATA[brainblog]]></dc:creator>
		<pubDate>Fri, 24 Mar 2023 13:13:56 +0000</pubDate>
				<category><![CDATA[Statistical Analysis]]></category>
		<category><![CDATA[bar graph]]></category>
		<category><![CDATA[beginner]]></category>
		<category><![CDATA[error bar]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[tidyverse]]></category>
		<guid isPermaLink="false">https://brain-storm.space/?p=1291</guid>

					<description><![CDATA[The #8216;T#8217; symbol on bar graphs and line graphs represents error bars that extend above and below the]]></description>
										<content:encoded><![CDATA[
<p>The &#8216;T&#8217; symbol on bar graphs and line graphs represents error bars that extend above and below the data point, indicating standard error or standard deviation. Adding error bars in R&#8217;s ggplot2 is easy. Here&#8217;s a step-by-step guide:</p>



<h2>Preparing the Data</h2>



<p>We&#8217;ll be using tidyverse for this. If you have your own data, skip ahead.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
# Set working directory
setwd(&quot;~/Rpractice/&quot;)

# Load tidyverse
library(tidyverse)

# Generate random data
dat &lt;- list(
  X &lt;- rnorm(50, 30, 10),
  Y &lt;- rnorm(50, 50, 5),
  Z &lt;- rnorm(50, 40, 15)
)
df &lt;- data.frame(matrix(unlist(dat), nrow=50))
colnames(df) &lt;- c(&quot;A&quot;,&quot;B&quot;,&quot;C&quot;)

# Transform the data from wide to long
df.long &lt;- pivot_longer(df, cols = A:C, 
                        names_to = &quot;Categories&quot;, 
                        values_to = &quot;Values&quot;)
</pre></div>


<p>Now we have data that looks like this:</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
&gt; head(df.long)
# A tibble: 6 x 2
  Categories Values
  &lt;chr&gt;       &lt;dbl&gt;
1 A            26.8
2 B            53.7
3 C            27.3
4 A            31.0
5 B            58.7
6 C            56.2
&gt; 
</pre></div>


<p>To add error bars to bar graphs and line graphs, you need the mean and standard error (or standard deviation) of the data. We use pivot_longer to transform the data from wide to long format.&#8221;</p>



<h2>Calculating the Mean, Standard Deviation, and Standard Error of the Data</h2>



<p>To add error bars, we need to calculate the mean, standard deviation, and standard error. If you already have this data, you can skip this section.</p>



<p>Calculating these values is easy with the dplyr package in tidyverse.</p>



<h3>Use group_by and summarise_all functions in dplyr to Calculate</h3>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
a &lt;- group_by(df.long, Categories) %&gt;% 
  summarise_all(list(mean = ~mean(.), 
                     sd = ~sd(.), 
                     se = ~sd(.)/sqrt(length(.))))

</pre></div>


<p>We can simplify this code further,</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
a &lt;- group_by(df.long, Categories) %&gt;% 
  summarise_all(list(mean = mean, 
                     sd = sd, 
                     se = ~sd/sqrt(length(.))))

</pre></div>


<p>Let&#8217;s take a look at the data for &#8216;a&#8217;</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
&gt; a
# A tibble: 3 x 4
  Categories  mean    sd    se
  &lt;chr&gt;      &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 A           30.1  9.93 1.40 
2 B           49.2  5.00 0.708
3 C           43.9 15.5  2.20 
</pre></div>


<p>We have generated a distribution that looks like this. Since the data was generated randomly, the values may vary slightly if you follow the same steps.</p>



<h2>Drawing Error Bars in Bar Graphs</h2>



<p>We will use the calculated data to create a bar graph and specify the error bars using <code><span class="marker">geom_errorbar()</span></code>. First, let&#8217;s specify only <code>ymin</code> and <code>ymax</code> in <code>geom_errorbar()</code> and take a look at the graph.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(a, aes(x = Categories, y = mean, fill = Categories))+
  geom_bar(stat = &quot;identity&quot;) +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se))

</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/06/errorbar1-1024x614.jpg" alt="" class="wp-image-905" width="512" height="307"/></figure>



<p>The width is too wide. Let&#8217;s adjust it using the <code>width</code> argument. We will narrow both the bar graph and error bars.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(a, aes(x = Categories, y = mean, fill = Categories))+
  geom_bar(stat = &quot;identity&quot;, width = 0.6) +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
                width = .1)

</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/06/errorbar2-1024x614.jpg" alt="" class="wp-image-906" width="512" height="307"/></figure>



<p>It&#8217;s a good to adjust the width according to the size of the output image.</p>



<h2>Drawing Error Bars on a Line Graph</h2>



<p>The process for adding error bars to a line graph is the same as above. First, draw the line graph and then add <code>geom_errorbar()</code>.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(a, aes(x = Categories, y = mean)) +
  geom_line(group = 1) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
                width = .1)
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/06/errorbar3-1024x614.jpg" alt="" class="wp-image-907" width="512" height="307"/></figure>



<h2>Adding Error Bars to Grouped Data</h2>



<p>Let&#8217;s generate grouped data.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
&lt;pre class=&quot;wp-block-syntaxhighlighter-code&quot;&gt;# Add ID column
data &lt;- df.long %&gt;% 
  tibble::rownames_to_column(var = &quot;ID&quot;)

# Convert ID from character to numeric
data$ID &lt;- as.numeric(data$ID)

# Assign 1 to ID &lt;= 75 and 0 to ID &gt;= 76 to create the &quot;group&quot; column
data &lt;- mutate(data, group = ifelse(ID &lt; 76, 1, 0))

# Calculate Mean, SD, and SE
b &lt;- group_by(data, group, Categories) %&gt;% 
  summarise_at(vars(Values), list(mean = ~mean(.), 
                     sd = ~sd(.), 
                     se = ~sd(.)/sqrt(length(.))))

# Convert group column values from numeric to character
b$group &lt;- as.character(b$group)

# Check the data included in b
&gt; head(b)
# A tibble: 6 x 5
# Groups:   group &lt;img class=&quot;ranking-number&quot; src=&quot;https://brain-storm.space/wp-content/themes/jin/img/rank02.png&quot; /&gt;
  group Categories  mean    sd    se
  &lt;chr&gt; &lt;chr&gt;      &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 0     A           31.4 11.1  2.21 
2 0     B           51.0  4.31 0.862
3 0     C           33.9 13.5  2.70 
4 1     A           29.5 10.4  2.08 
5 1     B           49.5  4.02 0.804
6 1     C           36.5 15.8  3.16 
&gt; &lt;/pre&gt;
</pre></div>


<p>Now that we have grouped data, let&#8217;s draw a bar graph with error bars.</p>



<h3>Adding Error Bars to Grouped Bar Graphs</h3>



<p>By specifying <code>position = "dodge"</code>, you can create a grouped bar chart like this.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(b, aes(x = Categories, y = mean, fill = group))+
  geom_bar(stat = &quot;identity&quot;, width = 0.6, position = &quot;dodge&quot;)
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/06/errorbar4-1024x614.jpg" alt="" class="wp-image-909" width="512" height="307"/></figure>



<p>Let&#8217;s add error bars to this plot.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(b, aes(x = Categories, y = mean, fill = group))+
  geom_bar(stat = &quot;identity&quot;, width = 0.6, position = &quot;dodge&quot;)+
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se), 
                    position = &quot;dodge&quot;, width = .1)
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/06/errorbar5-1024x614.jpg" alt="" class="wp-image-910" width="512" height="307"/></figure>



<p>It is shifted to the center. You can adjust the position with <span class="marker"><code>position = position_dodge()</code></span>.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(b, aes(x = Categories, y = mean, fill = group)) +
  geom_bar(stat = &quot;identity&quot;, width = 0.6, position = &quot;dodge&quot;) +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
                position = position_dodge(0.6), width = .1)
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/06/errorbar6-1024x614.jpg" alt="" class="wp-image-911" width="512" height="307"/></figure>



<p>You&#8217;ve created a nice bar graph!</p>



<h3>Drawing Error Bars on a Grouped Line Graph</h3>



<p>The same can be done for a grouped line graph as well.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(b, aes(x = Categories, y = mean, group = group, color = group)) +
  geom_line() +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
                width = .1, color = &quot;black&quot;)
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/06/errorbar7-1024x614.jpg" alt="" class="wp-image-913" width="512" height="307"/></figure>



<p>The error bars are on top and overlapping, making it difficult to see. Let&#8217;s move <span class="marker">the error bars to the back first</span>.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(b, aes(x = Categories, y = mean, group = group, color = group)) +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
                width = .1, color = &quot;black&quot;) +
  geom_line() +
  geom_point(size = 3)
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/06/errorbar8-1024x614.jpg" alt="" class="wp-image-914" width="512" height="307"/></figure>



<p>By <span class="marker">shifting the position to the left or right</span> using <code>position_dodge()</code>, overlapping of error bars can be avoided.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(b, aes(x = Categories, y = mean, group = group, color = group)) +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
                width = .1, color = &quot;black&quot;, position = position_dodge(.2)) +
  geom_line(position = position_dodge(.2)) +
  geom_point(size = 3, position = position_dodge(.2))

</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/06/errorbar10-1024x614.jpg" alt="" class="wp-image-915" width="512" height="307"/></figure>



<p>Now you can add error bars to line graphs. I hope this was helpful.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Violin Plot in R &#8211; How to Draw It &#8211;</title>
		<link>https://brain-storm.space/violin-plot-en/1265/</link>
		
		<dc:creator><![CDATA[brainblog]]></dc:creator>
		<pubDate>Fri, 24 Mar 2023 10:33:25 +0000</pubDate>
				<category><![CDATA[Statistical Analysis]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Violinplot]]></category>
		<guid isPermaLink="false">https://brain-storm.space/?p=1265</guid>

					<description><![CDATA[There are several ways to visualize data, and one of them is the violin plot! This method is useful for compar]]></description>
										<content:encoded><![CDATA[
<p>There are several ways to visualize data, and one of them is the violin plot! This method is useful for comparing multiple sets of data, and it has an appealing appearance.</p>



<p>First, you&#8217;ll need to prepare your data. While you&#8217;re at it, you can also check the data distribution using a dot plot.</p>



<h2>Data Generation and Distribution Checking</h2>



<p>Now, let&#8217;s create some data. If you have your own data, feel free to use that instead.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
# Set working directory
setwd(&quot;~/Rpractice/&quot;)
# load tidyverse and ggbeeswarm
library(tidyverse)
library(ggbeeswarm)

# generate sample data
dat &lt;- list(
  X &lt;- rnorm(100, 5, 10),
  Y &lt;- rnorm(100, 20, 10),
  Z &lt;- rnorm(100, 15, 15)
)

# format the data into a more usable shape using pivot_longer
df &lt;- data.frame(matrix(unlist(dat), nrow=100))
colnames(df) &lt;- c(&quot;A&quot;,&quot;B&quot;,&quot;C&quot;)
df.long &lt;- pivot_longer(df, cols = A:C, names_to = &quot;Categories&quot;, values_to = &quot;Values&quot;)

# draw dotplot
ggplot(df.long, aes(x = Categories, y = Values))+
  geom_beeswarm(aes(color = Categories),
                size = 2,
                cex = 2,
                alpha = .8)+
  theme_classic()+
  theme(legend.position = &quot;none&quot;)

</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/ggbeeswarm1-1024x1024.jpg" alt="" class="wp-image-874" width="512" height="512"/></figure>



<p>The dot plot displays the data distribution and can be used to confirm that the data is appropriate for creating a violin plot.</p>



<h2>Draw a Simple Violin Plot</h2>



<p>To draw a violin plot using ggplot2, you can utilize the geom_violin() function. To create a clean and simple plot, set the background color to white using the theme_classic() function.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(df.long, aes(x = Categories, y = Values))+
  geom_violin()+
  theme_classic()
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/gg_violin_1-1024x1024.jpg" alt="" class="wp-image-876" width="512" height="512"/></figure>



<p>The ends of the violin plot may appear cut off. By overlaying a dot plot on top of the violin plot, you can address this issue. To do so, you can use either geom_dotplot() or geom_beeswarm(), which are both part of the ggplot2 package.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(df.long, aes(x = Categories, y = Values))+
  geom_violin()+
  geom_beeswarm(aes(color = Categories),
                size = 2,
                cex = 2,
                alpha = .8)+
  theme_classic()
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/gg_violin_2-1024x1024.jpg" alt="" class="wp-image-877" width="512" height="512"/></figure>



<p>You can see that the ends of this violin plot are cut off at the minimum and maximum values. If you don&#8217;t want to cut off the ends, you can use geom_violin(trim = FALSE) to specify this preference.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(df.long, aes(x = Categories, y = Values))+
  geom_violin(trim = FALSE)+
  geom_beeswarm(aes(color = Categories),
                size = 2,
                cex = 2,
                alpha = .8)+
  theme_classic()
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/gg_violin_3-1024x1024.jpg" alt="" class="wp-image-878" width="512" height="512"/></figure>



<p>The violin plot may stretch vertically up and down, even in areas where there are no data points.</p>



<h2>Add Color to Violin Plot</h2>



<h3>Add Color to the Borders</h3>



<p>To add color to the border of the violin plot, you can use aes(color = ) inside the geom_violin() function.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(df.long, aes(x = Categories, y = Values, color = Categories))+
  geom_violin()+
  theme_classic()

</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/gg_violin_4-1024x1024.jpg" alt="" class="wp-image-879" width="512" height="512"/></figure>



<h3>Fill the interior of Violin Plot</h3>



<p>If you want to fill the interior of the violin plot with color, you can use aes(fill =) inside the geom_violin() function.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(df.long, aes(x = Categories, y = Values, fill = Categories))+
  geom_violin()+
  theme_classic()
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/gg_violin_5-1024x1024.jpg" alt="" class="wp-image-881" width="512" height="512"/></figure>



<p>There are several ways to change the color of the violin plot. One way is to use the scale_fill_brewer() function to specify the color scheme.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(df.long, aes(x = Categories, y = Values, fill = Categories))+
  geom_violin()+
  scale_fill_brewer(palette = &quot;Set2&quot;)+
  theme_classic()

</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/gg_violin_6-1024x1024.jpg" alt="" class="wp-image-882" width="512" height="512"/></figure>



<h2>Add Mean or Median to the Violin Plot</h2>



<p>To add the mean or median to the violin plot, you can use the stat_summary() function.</p>



<h3>Add Mean</h3>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(df.long, aes(x = Categories, y = Values, color = Categories))+
  geom_violin()+
  stat_summary(fun = mean, geom = &quot;point&quot;, 
               shape = 16, size = 2, color = &quot;red&quot;)+
  theme_classic()
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/gg_violin_7-1024x1024.jpg" alt="" class="wp-image-885" width="512" height="512"/></figure>



<div class="wp-block-jin-gb-block-border jin-sen"><div class="jin-sen-solid" style="border-width:3px;border-color:#f48789"></div></div>



<p>The shape parameter in stat_summary() is the same as pch in base R. Here&#8217;s a list of shapes that correspond to each numerical value:</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/pch_figure-5.png" alt="" class="wp-image-722" width="693" height="389"/></figure>



<div class="wp-block-jin-gb-block-border jin-sen"><div class="jin-sen-solid" style="border-width:3px;border-color:#f48789"></div></div>



<h3>Add Median</h3>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(df.long, aes(x = Categories, y = Values, color = Categories))+
  geom_violin()+
  stat_summary(fun = median, geom = &quot;point&quot;, 
               shape = 3, size = 2, color = &quot;red&quot;)+
  theme_classic()

</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/gg_violin_8-1024x1024.jpg" alt="" class="wp-image-886" width="512" height="512"/></figure>



<h2>Change the Degree of Smoothing</h2>



<p>To change the degree of smoothing in the violin plot, you can use the adjust parameter inside the geom_violin() function. The default value for adjust is 1.</p>



<p>To decrease the degree of smoothing, you can set adjust to a smaller value. For example, to set adjust to 0.2, you can use the following code:</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(df.long, aes(x = Categories, y = Values, fill = Categories))+
  geom_violin(adjust = .2)+
  theme_classic()
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/gg_violin_9-1024x1024.jpg" alt="" class="wp-image-888" width="512" height="512"/></figure>



<p>Conversely, if you want to increase the degree of smoothing in the violin plot, you can set the adjust parameter to a larger value.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(df.long, aes(x = Categories, y = Values, fill = Categories))+
  geom_violin(adjust = 2)+
  theme_classic()
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/gg_violin_10-1024x1024.jpg" alt="" class="wp-image-889" width="512" height="512"/></figure>



<p>If you increase the degree of smoothing too much, the violin plot can become overly smoothed and lose important details.</p>



<h2>Overlaying a Box Plot</h2>



<p>Overlaying a violin plot with a box plot is a common technique in data visualization, and it can be a powerful way to display data.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(df.long, aes(x = Categories, y = Values, fill = Categories))+
  geom_violin()+
  geom_boxplot(width = .1, fill = &quot;white&quot;)+
  theme_classic()

</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/gg_violin_boxplot1-1024x1024.jpg" alt="" class="wp-image-890" width="512" height="512"/></figure>



<p>To hide the outliers in the box plot when overlaying it with a violin plot, you can use the outlier.color parameter inside the geom_boxplot() function.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(df.long, aes(x = Categories, y = Values, fill = Categories))+
  geom_violin()+
  geom_boxplot(width = .1, fill = &quot;white&quot;, outlier.color = NA)+
  theme_classic()

</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/gg_violin_boxplot2-1024x1024.jpg" alt="" class="wp-image-891" width="512" height="512"/></figure>



<p>You can fill the box plot with black color and add a white circle at the median value.</p>


<div class="wp-block-syntaxhighlighter-code "><pre class="brush: r; title: Code example; notranslate">
ggplot(df.long, aes(x = Categories, y = Values))+
  geom_violin()+
  geom_boxplot(width = .1, fill = &quot;black&quot;, outlier.color = NA) +
  stat_summary(fun = median, geom = &quot;point&quot;, fill = &quot;white&quot;, shape = 21, size = 3) +
  theme_classic()
</pre></div>


<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://brain-storm.space/wp-content/uploads/2021/05/gg_violin_boxplot3-1024x1024.jpg" alt="" class="wp-image-892" width="512" height="512"/></figure>



<p>With the information provided, I believe you can now create a violin plot. I hope this guidance proves helpful!</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
