rdatasculptor
rdatasculptor

Reputation: 8447

How can I reshape my dataframe using reshape package?

I have a dataframe that looks like this:

step  var1  score1  score2
1      a    0        0
2      b    1        1
3      d    1        1
4      e    0        0
5      g    0        0
1      b    1        1
2      a    1        0
3      d    1        0
4      e    0        1
5      f    1        1
1      g    0        1
2      d    1        1
etc.

Because I need to correlate variabeles a-g (their scores are in score1) with score2 in only step 5 I think i need to schange my dataset into this first:

a   b   c   d   e   f   g   score2_step5
0   1       1   0       0   0
1   1       1   0   1       1
            1           0 
etc.

I am pretty sure that the Reshape package should be able to help me to do the job, but I haven't been able to make it work yet. Can anyone help me? Many thanks in advance!

Upvotes: 0

Views: 97

Answers (3)

Jonathan Christensen
Jonathan Christensen

Reputation: 3866

It looks like you want 7 correlations between the variables a-g and score2_step5--is that correct? First, you're going to need another variable. I'm assuming that step repeats continuously from 1 to 5; if not, this is going to be more complicated. I'm assuming your data is called df. I also prefer the newer reshape2 package, so I'm using that.

df$block <- rep(1:(nrow(df)/5),each=5)
df.molten <- melt(df,id.vars=c("var1", "step", "block"),measure.vars=c("score1"))
df2 <- dcast(df.molten, block ~ var1)
score2_step5 <- df$score2[df$step==5]

and then finally

cor(df2, score2_step5, use='pairwise')

There's an extra column (block) in df2 that you can get rid of or just ignore.

Upvotes: 1

Arun
Arun

Reputation: 118889

Here's another version. In case there is no step = 5, the value for score2_step = 0. Assuming your data.frame is df:

require(reshape2)
out <- do.call(rbind, lapply(seq(1, nrow(df), by=5), function(ix) {
    iy <- min(ix+4, nrow(df))
    df.b <- df[ix:iy, ]
    tt <- dcast(df.b, 1 ~ var1, fill = 0, value.var = "score1", drop=F)
    tt$score2_step5 <- 0
    if (any(df.b$step == 5)) {
        tt$score2_step5 <- df.b$score2[df.b$step == 5]
    }
    tt[,-1]
}))

> out
   a b d e f g score2_step5
2  0 1 1 0 0 0            0
21 1 1 1 0 1 0            1
22 0 0 1 0 0 0            0

Upvotes: 2

Ben Bolker
Ben Bolker

Reputation: 226871

I added another row to your example data because my code doesn't work unless there is a step-5 observation in every block.

dat <- read.table(textConnection("
step  var1  score1  score2
1      a    0        0
2      b    1        1
3      d    1        1
4      e    0        0
5      g    0        0
1      b    1        1
2      a    1        0
3      d    1        0
4      e    0        1
5      f    1        1
1      g    0        1
2      d    1        1
5      a    1        0"),header=TRUE)

Like @JonathanChristensen, I made another variable (I called it rep instead of block), and I made var1 into a factor (since there are no c values in the example data set given and I wanted a placeholder).

dat <- transform(dat,var1=factor(var1,levels=letters[1:7]),
                 rep=cumsum(step==1))

tapply makes the table of score1 values:

tab <- with(dat,tapply(score1,list(rep,var1),identity))

add the score2, step-5 values:

data.frame(tab,subset(dat,step==5,select=score2))

Upvotes: 0

Related Questions