Garrett
Garrett

Reputation: 45

Commands to transform data.frame in R

I'm struggling with coming up with the correct process to transform some data I'm doing analysis on without resorting to a scripting language.

The data takes a format similar to the following

data.frame(Group=LETTERS[1:3],Total=c(100,120,130),Modified=c(12,15,32))

  Group Total Modified
1     A   100       12
2     B   120       15
3     C   130       32

I'd like the resulting data frame to look like

    +-------+----------+
    | Group | Modified |
    +-------+----------+
    | A     | Y        |
    | A     | Y        |
    | A     | Y        |
    | .     | .        |
    | .     | .        |
    | .     | .        |
    | A     | N        |
    | A     | N        |
    | B     | Y        |
    | B     | Y        |
    | .     | .        |
    | .     | .        |
    | .     | .        |
    | B     | N        |
    +-------+----------+

There should be 12 rows with Group A and Modified = Y and 88 rows with Group A and Modified = N. Same goes for B, C, etc.

In most cases there are additional columns that will need to be repeated on each row along with the Group info.

Upvotes: 4

Views: 174

Answers (3)

Tyler Rinker
Tyler Rinker

Reputation: 109864

Slightly different approach:

dat <- data.frame(Group=LETTERS[1:3],Total=c(100,120,130),Modified=c(12,15,32))

dat$diff <- dat$Total - dat$Modified
library(reshape2)
dat2 <- melt(dat[, -2])
dat2 <- dat2[order(dat2$Group), ]
levels(dat2$variable) <- c("Y", "N")
dat2 <- dat2[rep(1:nrow(dat2), dat2$value), -3]
rownames(dat2) <- NULL

Upvotes: 0

thelatemail
thelatemail

Reputation: 93813

Code to convert:

result <- do.call(rbind,
                by(test,
                   test$Group,
                   function(x) 
                     data.frame(
                      Group=x$Group[1],
                      Modified=rep(c("Y","N"),c(x$Modified,x$Total - x$Modified))
                      )
                   )
                  )

Output like:

> head(result)
    Group Modified
A.1     A        Y
A.2     A        Y
A.3     A        Y
A.4     A        Y
A.5     A        Y
A.6     A        Y

Checking it worked:

> with(result,table(Group,Modified))
     Modified
Group   N   Y
    A  88  12
    B 105  15
    C  98  32

Upvotes: 3

mnel
mnel

Reputation: 115392

You can use rep with the appropriate times argument.

A data.table solution for coding elegance

library(data.table)
# your data is in the data.frame DF
DF <- data.table(DF)
levels <- c('Y', 'N')
DF[,list(Modified = rep(levels,c(Modified,Total-Modified))),by = Group]

Upvotes: 10

Related Questions