Reputation: 555
I'm trying to filter this dataframe called df
structure(list(ï..ID = structure(c(2L, 1L, 4L, 6L, 3L, 7L, 5L,
8L), .Label = c("Jay ", "Jim", "Jim ", "John ", "Mike ", "Peter",
"Peter ", "Tom"), class = "factor"), Target1 = structure(c(8L,
4L, 6L, 5L, 2L, 1L, 3L, 7L), .Label = c("Andreas", "Cheyne",
"Frank", "John", "Mickey", "Raj", "Sarah", "Timothy"), class = "factor"),
Target2 = structure(c(4L, 3L, 1L, 5L, 2L, 1L, 1L, 1L), .Label = c("",
"Jake", "Peter", "Timothy ", "Tommy "), class = "factor"),
Parter1 = structure(c(3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"Mike ", "Timothy"), class = "factor"), Parter2 = structure(c(1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "Peter"), class = "factor")), class = "data.frame", row.names = c(NA,
-8L))
I'm trying to filter the dataframe such that if the names that appear in this vector x
x=c("Raj", "Timothy")
appears more than 2 times in columns 2-5 the row will have a variable for a brand new column called flag
where flag =1 if the name appears more than 2 times and flag=0 if the name does not appear more than 2 times.
Upvotes: 0
Views: 564
Reputation: 388982
In Base R, we could use apply
with MARGIN = 1
(row-wise)
df$flag <- as.integer(apply(df, 1, function(row) sum(row %in% x)) > 2)
df
# ï..ID Target1 Target2 Parter1 Parter2 flag
#1 Jim Timothy Timothy Timothy 1
#2 Jay John Peter Mike Peter 0
#3 John Raj 0
#4 Peter Mickey Tommy 0
#5 Jim Cheyne Jake 0
#6 Peter Andreas 0
#7 Mike Frank 0
#8 Tom Sarah 0
apply
converts dataframe to matrix and can be slow some time. You can avoid apply
call using sapply
with same logic
df$flag <- as.integer(sapply(1:nrow(df), function(i) sum(df[i, ] %in% x)) > 2)
And another way to write it
df$flag <- as.integer(colSums(sapply(1:nrow(df), function(i) df[i, ] %in% x)) > 2)
PS - You had some white-spaces in the names, I had to first run
df[] <- lapply(df, trimws)
to remove them.
Upvotes: 1