Reputation: 231
I work with R and Rstudio. I got my hands on a longitudinal data frame, it looks basically like this:
trait_A_time_1 <- c("2.2","2.9","1.4","3.6")
trait_A_time_2 <- c("4.2","3.2","2.1","4.0")
trait_A_time_3 <- c("2.2","2.5","3.4","1.9")
trait_A_time_4 <- c("3.2","3.9","4.5","4.7")
trait_A_time_5 <- c("2.8","3.3","4.0","1.1")
df <- data.frame(trait_A_time_1, trait_A_time_2, trait_A_time_3, trait_A_time_4, trait_A_time_5)
print (df)
trait_A_time_1 trait_A_time_2 trait_A_time_3 trait_A_time_4 trait_A_time_5
1 2.2 4.2 2.2 3.2 2.8
2 2.9 3.2 2.5 3.9 3.3
3 1.4 2.1 3.4 4.5 4.0
4 3.6 4.0 1.9 4.7 1.1
It measured a certain psychological trait in persons over a few weeks and measurement occasions. And now I want to make a boxplot that looks like this:
x axis (groups): the four occasions of measurment
y axis: levels of trait A in the sample
I tried this code:
p <- ggplot(data2, aes(x=, y=)) +
geom_violin()
p
But it does not work since I have no dedicated variables for the occasions or the level of A. How exactly can I get those? How do I have to transpose/restructure this dataset, to get my desired boxplots?
Upvotes: 0
Views: 169
Reputation: 571
I added some sample data. This should do it
library(tidyverse)
df <- tibble(`trait A time 1` = c(3.3, 2.1, rnorm(10)),
`trait A time 2` = c(4.1, 2.2, rnorm(10)),
`trait A time 5` = c(3.9, 1.9, rnorm(10)))
df %>%
rename_with(.fn = function(x) gsub('trait A time', "", x)) %>%
pivot_longer(cols = everything()) %>%
ggplot(data = .,
aes(x = name, y = value)) +
geom_violin() +
labs(x = "time", y = "trait A")
You don't necessarily have to rename like I did here, the gist of the code is in the pivoting with pivot_longer
.
EDIT:
As per request, I will try and shortly explain what the first two lines do. rename_with()
is a functon from the dplyr
package that is able to rename column names. It allows several options to rename columns, but in this case I provided a function to rename all columns names. The function simply replaces 'trait A time' in any column name for an empty character ''. It is not the cleanest thing to do, but it serves its purpose.
pivot_longer()
is a very niche function (also from dplyr
) which you will likely use more often from now if you are going to continue to work with R
. Essentially, it is able to transform the dataframe you have into a dataframe with more rows --- making it a longer dataframe. Long dataframes are usually the way to go for plotting with ggplot
. It creates a name column and a value column, but the names of these columns can also be changed. Notice that every row of this long dataframe provides info for only 1 observation, namely an observation with corresponding name (measurement time in your case) and its corresponding value. Before, you had a wider dataframe that contains information of more than 1 observation, which you should maybe imagine it being harder to plot if there is too much info per row to plot.
df %>%
rename_with(.fn = function(x) gsub('trait A time', "", x)) %>%
pivot_longer(cols = everything()) %>%
print()
#> # A tibble: 36 x 2
#> name value
#> <chr> <dbl>
#> 1 " 1" 3.3
#> 2 " 2" 4.1
#> 3 " 5" 3.9
#> 4 " 1" 2.1
#> 5 " 2" 2.2
#> 6 " 5" 1.9
#> 7 " 1" 0.293
#> 8 " 2" 0.274
#> 9 " 5" -0.869
#> 10 " 1" 2.30
#> # ... with 26 more rows
Upvotes: 2