Reputation: 1
Consider the following dataframe where I have 7 variables id, A through F
id A B C D E F
1 5590 23658 523 727 52903 732569
2 24311 421 4319 5597 32695 4521
3 626 623 78
And I would like a new variable G , where G only includes observations which have either the value 5590 or 421 in A through F.
So G would only include observations with id = 1 and 2 in this case.
Any fast way to scan the variables A through F on R?
Upvotes: 0
Views: 42
Reputation: 21432
This is a fast and simple solution:
dfr[which(grepl("5590|421 ", apply(dfr, 1, paste0, collapse = " "))),]
V1 V2 V3 V4 V5 V6
1 5590 23658 523 727 52903 732569
2 24311 421 4319 5597 32695 4521
It works by subsetting dfr
on those rows in which
the function grepl
finds matches for either number in paste
d-together rows
Upvotes: 0
Reputation: 65
dfr <- read.table(text= "5590 23658 523 727 52903 732569
24311 421 4319 5597 32695 4521
24311 431 4319 5597 32695 4521
24311 4211 431239 5597 32695 43521")
dfr <- as.numeric(dfr[,1:6])
#in case it exists remove it.
remove('subsetdfr')
i <- 0
#dim(dfr)[2] automatically output the number of columns of the dataframe
while (i < dim(dfr)[2]){
i <- i+1
if (exists('subsetdfr') == TRUE ) {
#add a row in case subsetdfr exists
subsetdfr <- rbind(subsetdfr,subset(dfr, dfr[i] == 421 | dfr[i] == 5590 ))
} else {
#create subsetdfr in case it does not exsits
subsetdfr <- data.frame(subset(dfr, dfr[i] == 421 | dfr[i] == 5590 ))
}
}
subsetdfr
this result in:
> subsetdfr
V1 V2 V3 V4 V5 V6
1 5590 23658 523 727 52903 732569
2 24311 421 4319 5597 32695 4521
Upvotes: 0
Reputation: 887501
We can use apply
df1$G <- apply(df1[-1], 1, function(x) intersect(x, c(5590, 421))[1])
Upvotes: 1