States.the.Obvious
States.the.Obvious

Reputation: 163

How to reshape the following dataframe in R

I have the following dataframe:

Original:

ID  C1  C2  C3  C4  C5  C6  C7  C8
A11 0   1   0   0   0   0   1   0
A21 0   0   1   1   0   0   0   0
A31 0   0   0   0   1   0   1   0
A41 0   0   0   0   0   1   0   0
A51 0   0   0   0   0   1   0   0
A61 0   0   0   0   0   1   0   1
A71 0   0   1   1   0   0   0   0
A81 1   0   0   1   0   0   1   0
A91 0   1   0   1   0   0   0   1
A10 1   0   1   0   0   1   0   1

I would ultimately like to have the data in the following format:

Final:

A11 C2  C7

A21 C3  C4

A31 C5  C7  

A41 C6  

A51 C6

A61 C6  C8  

A71 C3  C4

A81 C1  C4  C7

A91 C2  C4  C8

A10 C1  C3  C6  C8

So essentially, wherever the value != 0, replace that value with the name of the variable in that column. Is there a way to do the above in R?

Thank you!

Upvotes: 1

Views: 55

Answers (1)

lmo
lmo

Reputation: 38500

Here is a method using apply that returns a list where the list item names are the row names:

# construct reproducible example
set.seed(1234)
df <- data.frame(apple=sample(c(0,1), 10, replace=T), 
                 banana=sample(c(0,1), 10, replace=T),
                 carrot=sample(c(0,1), 10, replace=T))
# give it some row names
rownames(df) <- letters[1:10]

# return the list
myList <- apply(df, 1, function(i) names(df)[i!=0])

When using this method, you want to be sure that there is sufficient variation in your data. This is because apply (as do many R functions) tries to simplify the datatype of the output. The example that @digemall provides,

df <- structure(list(ID = c("A11", "A21", "A31", "A41", "A51", "A61" ), 
                     C1 = c(1, 1, 1, 1, 1, 1), C2 = c(0, 0, 0, 0, 0, 0)),
                .Names = c("ID", "C1", "C2"), row.names = c(NA, 6L), class = "data.frame")

returns a matrix, which is useful in that it provides the desired information, but is not the list type object that was expected. An even more insidious example is the following:

df <- data.frame(apple=c(0,1), banana=c(1,0))

where the method will return a useless character vector.

A safer method, that @digemall suggests is to use lapply to loop down the rows. Because lapply always returns a list, we don't have to worry about either of the previous concerns:

myList <- lapply(1:nrow(df),function(i)names(df)[df[i,]==1])

Now we have to add back the names:

names(res) <- row.names(df)

Upvotes: 4

Related Questions