Reputation:
I have data set as follows:
A B C
R1 1 0 1
R2 0 1 0
R3 0 0 0
I want to add another column in data set named index such that it gives column names for each row where the column value is greater than zero. The result I want is as follows:
A B C Index
R1 1 0 1 A,C
R2 0 1 0 B
R3 0 0 0 NA
Upvotes: 1
Views: 4459
Reputation: 19716
Here is one approach using base:
use apply to go over rows, find elements that are equal to one and paste together the corresponding column names:
df$Index <- apply(df, 1, function(x) paste(colnames(df)[which(x == 1)], collapse = ", "))
df$Index <-
crate a new column called Index
where the result of the operation will be held
apply
- applies a function over rows and/or columns of a matrix/data frame
1
- specify that the function should be applied to rows (2
- means over columns)
function(x)
an unnamed function which is further defined - x
corresponds to each row
which(x == 1)
which elements of a row are equal to 1
output is TRUE/FALSE
colnames(df)
- names of the columns of the data frame
colnames(df)[which(x == 1]
- subsets the column names which are TRUE
for the expression which(x == 1)
paste
with collapse = ", "
- collapse a character vector (in this case a vector of column names that we acquired before) into a string where each element will be separated by ,
.
now replace empty entries with NA
df$Index[df$Index == ""] <- NA_character_
here is how the output looks like
#output
sample A B C Index
1 R1 1 0 1 A, C
2 R2 0 1 0 B
3 R3 0 0 0 <NA>
data:
structure(list(sample = structure(1:3, .Label = c("R1", "R2",
"R3"), class = "factor"), A = c(1L, 0L, 0L), B = c(0L, 1L, 0L
), C = c(1L, 0L, 0L)), .Names = c("sample", "A", "B", "C"), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 2
Reputation: 33508
Slightly different flavored apply()
solution:
df$index <- apply(df, 1, function(x) ifelse(any(x), toString(names(df)[x == 1]), NA))
A B C index
R1 1 0 1 A, C
R2 0 1 0 B
R3 0 0 0 <NA>
data:
df <- structure(
list(
A = c(1L, 0L, 0L),
B = c(0L, 1L, 0L),
C = c(1L, 0L, 0L)
),
row.names = paste0('R', 1:3),
class = "data.frame"
)
Upvotes: 1