Reputation: 61
I have a data frame (df) with 7 rows and 4 columns (named c1, c2, c3, c4):
c1 c2 c3 c4
Yes No Yes No
Yes Yes No No
No Yes No No
Yes No No No
Yes No Yes No
Yes No No No
No No Yes No
I want to add a 5th column to the data frame named Expected Result if the values on columns 1 to 4 are equal to "Yes". For example, on row 1, I have "Yes" parameters in Column 1 and Column 3. To populate Expected Result column, I would concatenate and add Column1 name and Column 2 name to the result.
Here is the full results expected:
c1, c3
c1, c2
c2
c1
c1, c3
c1
c3
I have the following line of code but something is not quite right:
df$Expected_Result <- colnames(df)[apply(df,1,which(LETTERS="Unfit"))]
Upvotes: 6
Views: 3779
Reputation: 1378
You could try something like:
colnames(df) <- c("c1", "c2", "c3", "c4")
test <- (apply(df,1,function(x) which(x=="Yes")))
df$cols <- lapply(test,names)
This was along the lines of what you were initially trying I think.
To tidy the output you could:
df$cols <- gsub("c(", "", df$cols, fixed = TRUE)
df$cols <- gsub(")", "", df$cols, fixed = TRUE)
This removes the c()
.
Upvotes: 2
Reputation: 4473
An option using data.table
library(data.table)
setDT(df)[, rownum:=1:.N,]
df$Expected_result <- melt(df, "rownum")[,
toString(variable[value=="Yes"]), rownum]$V1
Upvotes: 5
Reputation: 887901
We can loop (apply
) through the rows (MARGIN=1
) of the logical matrix (df=='Yes'
), convert to 'numeric' index (which
), get the names
and paste
it together with a wrapper toString
which is paste(., collapse=', ')
. We may also need a if/else
logical condition to check if there are any
'Yes' values in a row. If not, it should return NA
.
df$Expected_Result <- apply(df=='Yes', 1, function(x) {
if(any(x)) {
toString(names(which(x)))
}
else NA
})
Or another option would to get the row/column
index with which
by specifying the arr.ind=TRUE
. Grouped by the row
of 'indx' (indx[,1]
), we paste
the column names of 'df' ('val'). If there are some rows missing i.e. without any 'Yes' element, then use ifelse
to create NA
for the missing row.
indx <- which(df=='Yes', arr.ind=TRUE)
val <- tapply(names(df)[indx[,2]], indx[,1], FUN=toString)
df$Expected_Result <- ifelse(seq_len(nrow(df)) %in% names(val), val, NA)
df <- structure(list(c1 = c("Yes", "Yes", "No", "Yes", "Yes", "Yes",
"No"), c2 = c("No", "Yes", "Yes", "No", "No", "No", "No"), c3 = c("Yes",
"No", "No", "No", "Yes", "No", "Yes"), c4 = c("No", "No", "No",
"No", "No", "No", "No")), .Names = c("c1", "c2", "c3", "c4"),
class = "data.frame", row.names = c(NA, -7L))
Upvotes: 6