Reputation: 5088
I have a recurrent problem. I often have multiple vectors or columns in a data.frame representing conditions. For example:
condition_1 condition_2 condition_3
5.3 2.6 1.2
25.5 2.2 1.4
13.1 0.1 9.2
...
Often I want to compare these conditions using an ANOVA. However, most ANOVA functions need the data to be specified as factors, like this:
value condition
5.3 condition_1
25.5 condition_1
13.1 condition_1
2.6 condition_2
2.2 condition_2
0.1 condition_2
1.2 condition_3
1.4 condition_3
9.2 condition_3
...
Is there a fast and easy way in R for converting from the former to the latter formatting?
Upvotes: 0
Views: 76
Reputation: 92282
Or using the new tidyr
package
library(tidyr)
gather(dat, condition, value, condition_1:condition_3)
# condition value
# 1 condition_1 5.3
# 2 condition_1 25.5
# 3 condition_1 13.1
# 4 condition_2 2.6
# 5 condition_2 2.2
# 6 condition_2 0.1
# 7 condition_3 1.2
# 8 condition_3 1.4
# 9 condition_3 9.2
Upvotes: 3
Reputation: 78792
Alternate approach with melt
from reshape2
:
dat <- read.table(text="condition_1 condition_2 condition_3
5.3 2.6 1.2
25.5 2.2 1.4
13.1 0.1 9.2", stringsAs=FALSE, header=TRUE)
library(reshape2)
dat_m <- melt(dat)
dat_m
## variable value
## 1 condition_1 5.3
## 2 condition_1 25.5
## 3 condition_1 13.1
## 4 condition_2 2.6
## 5 condition_2 2.2
## 6 condition_2 0.1
## 7 condition_3 1.2
## 8 condition_3 1.4
## 9 condition_3 9.2
str(dat_m)
## 'data.frame': 9 obs. of 2 variables:
## $ variable: Factor w/ 3 levels "condition_1",..: 1 1 1 2 2 2 3 3 3
## $ value : num 5.3 25.5 13.1 2.6 2.2 0.1 1.2 1.4 9.2
Upvotes: 3
Reputation: 99331
Sure. You can use stack
. It's not necessarily "fast" but it sure is easy.
stack(df)
# values ind
# 1 5.3 condition_1
# 2 25.5 condition_1
# 3 13.1 condition_1
# 4 2.6 condition_2
# 5 2.2 condition_2
# 6 0.1 condition_2
# 7 1.2 condition_3
# 8 1.4 condition_3
# 9 9.2 condition_3
sapply(stack(df), class)
# values ind
# "numeric" "factor"
where df
is
structure(list(condition_1 = c(5.3, 25.5, 13.1), condition_2 = c(2.6,
2.2, 0.1), condition_3 = c(1.2, 1.4, 9.2)), .Names = c("condition_1",
"condition_2", "condition_3"), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 3