Reputation: 163
I have the following dataframe:
Original:
ID C1 C2 C3 C4 C5 C6 C7 C8
A11 0 1 0 0 0 0 1 0
A21 0 0 1 1 0 0 0 0
A31 0 0 0 0 1 0 1 0
A41 0 0 0 0 0 1 0 0
A51 0 0 0 0 0 1 0 0
A61 0 0 0 0 0 1 0 1
A71 0 0 1 1 0 0 0 0
A81 1 0 0 1 0 0 1 0
A91 0 1 0 1 0 0 0 1
A10 1 0 1 0 0 1 0 1
I would ultimately like to have the data in the following format:
Final:
A11 C2 C7
A21 C3 C4
A31 C5 C7
A41 C6
A51 C6
A61 C6 C8
A71 C3 C4
A81 C1 C4 C7
A91 C2 C4 C8
A10 C1 C3 C6 C8
So essentially, wherever the value != 0, replace that value with the name of the variable in that column. Is there a way to do the above in R?
Thank you!
Upvotes: 1
Views: 55
Reputation: 38500
Here is a method using apply
that returns a list where the list item names are the row names:
# construct reproducible example
set.seed(1234)
df <- data.frame(apple=sample(c(0,1), 10, replace=T),
banana=sample(c(0,1), 10, replace=T),
carrot=sample(c(0,1), 10, replace=T))
# give it some row names
rownames(df) <- letters[1:10]
# return the list
myList <- apply(df, 1, function(i) names(df)[i!=0])
When using this method, you want to be sure that there is sufficient variation in your data. This is because apply
(as do many R functions) tries to simplify the datatype of the output. The example that @digemall provides,
df <- structure(list(ID = c("A11", "A21", "A31", "A41", "A51", "A61" ),
C1 = c(1, 1, 1, 1, 1, 1), C2 = c(0, 0, 0, 0, 0, 0)),
.Names = c("ID", "C1", "C2"), row.names = c(NA, 6L), class = "data.frame")
returns a matrix, which is useful in that it provides the desired information, but is not the list type object that was expected. An even more insidious example is the following:
df <- data.frame(apple=c(0,1), banana=c(1,0))
where the method will return a useless character vector.
A safer method, that @digemall suggests is to use lapply
to loop down the rows. Because lapply
always returns a list, we don't have to worry about either of the previous concerns:
myList <- lapply(1:nrow(df),function(i)names(df)[df[i,]==1])
Now we have to add back the names:
names(res) <- row.names(df)
Upvotes: 4