Reputation: 2372
I have some data that I am not sure how bet to analyze. It is currently in Excel and will need to fiddling to get to work in R, I am sure. I have a set of targets, their sizes and color. I also have users, the condition and their score for each target.
So the first table look like this:
Target, 1, 2, 3, 4, 5 ...
Size, L, M, L, S, L ...
Color R, B, G, B, R ...
Then I have all the user data that has a column for the user id, a column for the device, then a column for the score on each target.
User, Condition, 1, 2, 3, ...
1 A 5, 2, 8, ...
1 D 2, 4, 6, ...
2 A 1, 4, 6, ...
2 B 5, 8, 3, ...
I mainly want to run an ANOVA between the 4 conditions so see if the mean scores are the same on L targets, or R targets for example.
I have never had to use a 2nd table to filter or look up data like this. How do I do this?
Upvotes: 2
Views: 460
Reputation: 83215
A possible alternative solution is joining the look-up table with the dataframe:
1. Some example data (same as @chl used in his answer, but with a dataframe instead of a list for the look-up values):
lut <- data.frame(Target=1:5, Size=c("L","M","L","S","L"), Color=c("R","B","G","B","R"))
df1 <- data.frame(rep(1:2, each=2), c("A","D","A","B"),
c(5,2,1,5), c(2,4,4,8), c(8,6,6,3))
names(df1) <- c("user", "condition", 1:3)
2. with the data.table package you can transform the dataframe to a data.table and to long format (which works the same as with reshape2)
dt.melt <- melt(setDT(df1), id=c("user","condition"),
variable.factor = FALSE)[, variable := as.numeric(variable)]
3. join with the look-up table in order to add the matching values of Size
and Color
to the long data.table:
dt.melt[lut, on = c("variable" = "Target"), nomatch=0]
or:
lut[dt.melt, on = c("Target" = "variable")]
which both result in:
user condition variable value Size Color
1: 1 A 1 5 L R
2: 1 D 1 2 L R
3: 2 A 1 1 L R
4: 2 B 1 5 L R
5: 1 A 2 2 M B
6: 1 D 2 4 M B
7: 2 A 2 4 M B
8: 2 B 2 8 M B
9: 1 A 3 8 L G
10: 1 D 3 6 L G
11: 2 A 3 6 L G
12: 2 B 3 3 L G
You can also bind this together in one call:
dt.melt <- melt(setDT(df1), id=c("user","condition"),
variable.factor = FALSE)[, variable := as.numeric(variable)
][lut, on = c("variable" = "Target"), nomatch=0]
With the combination of dplyr and tidyr you can achieve the same:
library(dplyr)
library(tidyr)
df.new <- df1 %>%
gather(variable, value, -c(1:2)) %>%
mutate(variable = as.numeric(as.character(variable))) %>%
left_join(., lut, by = c("variable" = "Target"))
which will give the same result:
> df.new
user condition variable value Size Color
1 1 A 1 5 L R
2 1 D 1 2 L R
3 2 A 1 1 L R
4 2 B 1 5 L R
5 1 A 2 2 M B
6 1 D 2 4 M B
7 2 A 2 4 M B
8 2 B 2 8 M B
9 1 A 3 8 L G
10 1 D 3 6 L G
11 2 A 3 6 L G
12 2 B 3 3 L G
Upvotes: 2
Reputation: 29367
Quick and dirty solution (because I believe someone will certainly propose a more elegant solution avoiding loop):
tab1 <- list(Target=1:5, Size=c("L","M","L","S","L"), Color=c("R","B","G","B","R"))
tab2 <- data.frame(rep(1:2, each=2), c("A","D","A","B"),
c(5,2,1,5), c(2,4,4,8), c(8,6,6,3))
names(tab2) <- c("User", "Condition", 1:3)
library(reshape)
tab2.melt <- melt(tab2, measure.vars=3:5)
for (i in 1:nrow(tab2.melt)) {
tab2.melt$Size[i] <- tab1$Size[tab1$Target==as.numeric(tab2.melt$variable[i])]
tab2.melt$Color[i] <- tab1$Color[tab1$Target==as.numeric(tab2.melt$variable[i])]
}
I am assuming you are able to import your data into R, but you may want to adapt the above code if the data structure isn't the one you show in your excerpt. Basically, the idea is to consider your Target
code as a way to index Size
and Color
levels, which we need in the final data.frame
for each repeated measurement (on the ith subject).
The updated data.frame
looks like:
> head(tab2.melt)
User Condition variable value Size Color
1 1 A 1 5 L R
2 1 D 1 2 L R
3 2 A 1 1 L R
4 2 B 1 5 L R
5 1 A 2 2 M B
6 1 D 2 4 M B
From there, you can perform a 3-way ANOVA or study specific contrasts.
Upvotes: 2