Reputation: 8447
I have a dataframe that looks like this:
step var1 score1 score2
1 a 0 0
2 b 1 1
3 d 1 1
4 e 0 0
5 g 0 0
1 b 1 1
2 a 1 0
3 d 1 0
4 e 0 1
5 f 1 1
1 g 0 1
2 d 1 1
etc.
Because I need to correlate variabeles a-g (their scores are in score1) with score2 in only step 5 I think i need to schange my dataset into this first:
a b c d e f g score2_step5
0 1 1 0 0 0
1 1 1 0 1 1
1 0
etc.
I am pretty sure that the Reshape package should be able to help me to do the job, but I haven't been able to make it work yet. Can anyone help me? Many thanks in advance!
Upvotes: 0
Views: 97
Reputation: 3866
It looks like you want 7 correlations between the variables a-g and score2_step5--is that correct? First, you're going to need another variable. I'm assuming that step
repeats continuously from 1 to 5; if not, this is going to be more complicated. I'm assuming your data is called df
. I also prefer the newer reshape2
package, so I'm using that.
df$block <- rep(1:(nrow(df)/5),each=5)
df.molten <- melt(df,id.vars=c("var1", "step", "block"),measure.vars=c("score1"))
df2 <- dcast(df.molten, block ~ var1)
score2_step5 <- df$score2[df$step==5]
and then finally
cor(df2, score2_step5, use='pairwise')
There's an extra column (block
) in df2
that you can get rid of or just ignore.
Upvotes: 1
Reputation: 118889
Here's another version. In case there is no step = 5
, the value for score2_step = 0
. Assuming your data.frame
is df
:
require(reshape2)
out <- do.call(rbind, lapply(seq(1, nrow(df), by=5), function(ix) {
iy <- min(ix+4, nrow(df))
df.b <- df[ix:iy, ]
tt <- dcast(df.b, 1 ~ var1, fill = 0, value.var = "score1", drop=F)
tt$score2_step5 <- 0
if (any(df.b$step == 5)) {
tt$score2_step5 <- df.b$score2[df.b$step == 5]
}
tt[,-1]
}))
> out
a b d e f g score2_step5
2 0 1 1 0 0 0 0
21 1 1 1 0 1 0 1
22 0 0 1 0 0 0 0
Upvotes: 2
Reputation: 226871
I added another row to your example data because my code doesn't work unless there is a step-5 observation in every block.
dat <- read.table(textConnection("
step var1 score1 score2
1 a 0 0
2 b 1 1
3 d 1 1
4 e 0 0
5 g 0 0
1 b 1 1
2 a 1 0
3 d 1 0
4 e 0 1
5 f 1 1
1 g 0 1
2 d 1 1
5 a 1 0"),header=TRUE)
Like @JonathanChristensen, I made another variable (I called it rep
instead of block
), and I made var1
into a factor (since there are no c
values in the example data set given and I wanted a placeholder).
dat <- transform(dat,var1=factor(var1,levels=letters[1:7]),
rep=cumsum(step==1))
tapply
makes the table of score1
values:
tab <- with(dat,tapply(score1,list(rep,var1),identity))
add the score2
, step-5 values:
data.frame(tab,subset(dat,step==5,select=score2))
Upvotes: 0