Reputation: 3396
I have a logical dataframe like:
> test
apple apple apple kiwi kiwi banana banana banana apple orange
1 FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE
2 TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE
3 FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
My aim is to combine the column with same column name. That's to say the output should be a dataframe with 4 column (apple, kiwi, banana, orange).
I tried :
testmerge <- df[, !duplicated(colnames(df))]
But the output is not what I look for. For each row given same column name, the output should be True as long there is at least 1 TRUE. For each row given same column name, the output should be False if there is 0 TRUE.
For intance first row first column is should be TRUE instead of FALSE.
Undesired testmerge
output:
apple kiwi banana orange
1 FALSE FALSE FALSE FALSE
2 TRUE TRUE TRUE FALSE
3 FALSE TRUE FALSE FALSE
Desired output:
apple kiwi banana orange
1 TRUE TRUE TRUE FALSE
2 TRUE TRUE TRUE FALSE
3 FALSE TRUE FALSE FALSE
Replicate dataframe:
test <- structure(list(apple = c(FALSE, TRUE, FALSE), apple = c(TRUE, TRUE,
FALSE), apple = c(FALSE, TRUE, FALSE), kiwi = c(FALSE, TRUE, TRUE
), kiwi = c(TRUE, TRUE, TRUE), banana = c(FALSE, TRUE, FALSE), banana = c(TRUE,
FALSE, FALSE), banana = c(TRUE, TRUE, FALSE), apple = c(TRUE, TRUE,
FALSE), orange = c(FALSE, FALSE, FALSE)), .Names = c("apple", "apple",
"apple", "kiwi", "kiwi", "banana", "banana", "banana", "apple", "orange"), row.names = c(NA,
-3L), class = "data.frame")
Upvotes: 2
Views: 190
Reputation: 56179
Using sapply and rowSums:
as.data.frame(
sapply(unique(colnames(test)),
function(i){
rowSums(test[, grepl(i, colnames(test)), drop = FALSE]) > 0})
)
#output
# apple kiwi banana orange
# 1 TRUE TRUE TRUE FALSE
# 2 TRUE TRUE TRUE FALSE
# 3 FALSE TRUE FALSE FALSE
We are subsetting datafame based on fruit names, then computing rowSums. TRUE is 1 and FALSE is 0, so rowSums of more than zero will have at least one TRUE value. I have drop = FALSE
, so the subset will stay as a dataframe in cases like orange
where there is only one column.
Note: If the data is long then Reduce() solution by @akrun works better, but if data is wide then rowSums() is more efficient.
Upvotes: 3
Reputation: 887168
Another option would be to split
the sequence of columns of the dataset by the names
of it into a list
, loop through the list
, subset based on the numeric index, use Reduce
to check whether there are any TRUE in each row.
sapply(split(seq_along(test), names(test)), function(i) Reduce(`|`, test[i]))
# apple banana kiwi orange
#[1,] TRUE TRUE TRUE FALSE
#[2,] TRUE TRUE TRUE FALSE
#[3,] FALSE FALSE TRUE FALSE
Upvotes: 2
Reputation: 92292
There maybe more efficient ways to achieve this, but here's a try
I would suggest to convert the column names to to unique ones using make.unique
, then convert to long format, check your condition by a row id and the column names (made unique again) and then convert back to a wide format, something like
library(data.table)
setnames(setDT(test), make.unique(names(test))) # Make column names unique
res <- melt(test[, id := .I], id = "id" # Add a row index and melt by it
)[, sum(value) > 0, # Check condition >>
by = .(id, Names = sub("\\..*", "", variable))] # by row id and unique names
dcast(res, id ~ Names, value.var = "V1") # Convert back to wide format
# id apple banana kiwi orange
# 1: 1 TRUE TRUE TRUE FALSE
# 2: 2 TRUE TRUE TRUE FALSE
# 3: 3 FALSE FALSE TRUE FALSE
Upvotes: 2