Reputation: 1438
I have a wide-form table that looks like this:
ID Test_11 LVL11 Score_X_11 Score_Y_11 Test_12 LV12 Score_X_12 Score_Y_12
1 A I 100 NA NA NA 100 100
2 A II 90 100 B II 90 85
3 NA NA NA NA B II 90 NA
4 A III 100 80 A III 75 75
5 B I NA 90 NA NA 60 50
6 B I 70 100 NA NA NA NA
7 B II 85 NA A I 60 60
And a table used for sorting that looks like this
Test_11 A
Test_11 B
Test_12 A
Test_12 B
What this second table tells us is that for Test_11 there are two versions, A and B (same for Test_12).
I am trying to create a series of boxplots that graph the distribution of every combination of Test_11 and Test_12, and their respective versions (A, B). So, for Test_11==A the boxplot created would have three groups (I, II, III) and then the resulting graphical information from the subset where Test_11==A, and then the same for Test_11==B, Test_12==A, and Test_12==B. In total there should be, in this example, 4 graphs created.
What I have in R is:
z <- subset(df, df$Test_11=="A")
plot(z$LVL11, z$Score_X_11, varwidth = TRUE, notch = TRUE, xlab = 'LVL',
ylab = 'score')
What I would like, and haven't been able to figure out how to do, is to write a for loop that does the subsetting for me so that I could automate this for my actual data set which has a few dozen of these combinations.
Thanks for any help and guidance.
Upvotes: 0
Views: 3254
Reputation: 1421
Maybe you should save all your logical vectors in a data.frame or matrix before the loop:
selections <- matrix(nrow = nrow(df), ncol = 4)
selections[,1] <- df$Test_11 == "A"
selections[,2] <- df$Test_11 == "B"
selections[,3] <- df$Test_12 == "A"
selections[,4] <- df$Test_12 == "B"
# etc...
par(mfcol = c(2, 2)) # here you should customize at will...
for (i in 1:4) {
z <- subset(df, selections[,i])
plot(z$LVL11, z$Score_X_11, varwidth = TRUE,
notch = TRUE, xlab = 'LVL',
ylab = 'score')
}
You can change your code so instead of using z$Score_X_11
, use z[,string]
. The value of string
should be constructed with paste
(or other string manipulating functions). For example:
v <- c("X", "Y")
n <- c("11", "12")
for (i in 1:2) {
for (j in 1:2) {
string <- paste("Score", v[i], n[i], sep = "_")
print(string)
}
}
A similar reasoning would be used with the z$LVLXX
values, so you should be able to figure out a way to accommodate for that.
I'm not very experienced with using trellis graphics (like in the other anwser), but I know a little ggplot2, so I decided to take the challenge and try a bit. It is not great, but at least works:
# df <- read.table("data.txt", header = TRUE, na.string = "NA")
require(reshape2)
require(ggplot2)
# Melt your data.frame, using the scores as the "values":
mdf <- melt(df[,-1], id = c("LVL11", "LV12", "Test_11", "Test_12"))
# loop through level types:
for (lvl in c("LVL11", "LV12")) {
# looping through values of test11
for (test11 in c("A", "B")) {
# Note the use of subset before ggplot
p <- ggplot(subset(mdf, Test_11 == test11), aes_string(x=lvl, y="value"))
# I added the geom_jitter as in the example given there were only a few points
g <- p + geom_boxplot(aes(fill = variable)) + geom_jitter(aes(shape = variable))
print(g) # it is necessary to print p explicitly like this in order to use ggplot in a loop
# Finally, save each plot with a relevant name:
savePlot(paste0(lvl, "-t11", test11, ".png"))
# (note that savePlot has some problems with RStudio iirc)
}
# Same as before, but with test_12
for (test12 in c("A", "B")) {
p <- ggplot(subset(mdf, Test_12 == test12), aes_string(x=lvl, y="value"))
g <- p + geom_boxplot(aes(fill = variable)) + geom_jitter(aes(shape = variable))
print(g)
savePlot(paste0(lvl, "-t12", test12, ".png"))
}
}
If anyone knows how to use trellis graphics or maybe facet_grid
in this case, so I can put all grahpics in one image, I would love to hear how.
cheers.
Upvotes: 1
Reputation: 55695
Classic plyr
solution (HT to @hadleywickham)
require(plyr); require(lattice); require(gridExtra)
bplots <- dlply(dat, .(Test_11, Test_12), function(df){
bwplot(Score_X_11 ~ LVL11, data = df)
})
do.call('grid.arrange', bplots)
Upvotes: 1