Reputation: 1

Complex subsetting of data

Consider the following dataframe where I have 7 variables id, A through F

id     A           B          C          D         E         F          
1    5590       23658      523        727       52903     732569
2    24311      421        4319       5597      32695     4521
3    626         623       78

And I would like a new variable G , where G only includes observations which have either the value 5590 or 421 in A through F.

So G would only include observations with id = 1 and 2 in this case.

Any fast way to scan the variables A through F on R?

Upvotes: 0

Answers (3)

Chris Ruehlemann

Reputation: 21432

This is a fast and simple solution:

dfr[which(grepl("5590|421 ", apply(dfr, 1, paste0, collapse = " "))),]
     V1    V2   V3   V4    V5     V6
1  5590 23658  523  727 52903 732569
2 24311   421 4319 5597 32695   4521

It works by subsetting dfr on those rows in whichthe function grepl finds matches for either number in pasted-together rows

Upvotes: 0

Johannes Stephan

Reputation: 65

dfr <- read.table(text= "5590       23658      523        727       52903     732569
24311      421        4319       5597      32695     4521
24311      431        4319       5597      32695     4521
24311      4211        431239       5597      32695     43521")

dfr <- as.numeric(dfr[,1:6])
#in case it exists remove it.
remove('subsetdfr')

i <- 0

#dim(dfr)[2] automatically output the number of columns of the dataframe
while (i < dim(dfr)[2]){
  i <- i+1
  
  if (exists('subsetdfr') == TRUE ) {
    #add a row in case subsetdfr exists
    subsetdfr <- rbind(subsetdfr,subset(dfr, dfr[i] == 421 | dfr[i] == 5590 )) 
    } else {
    #create subsetdfr in case it does not exsits
    subsetdfr <- data.frame(subset(dfr, dfr[i] == 421 | dfr[i] == 5590 ))
    }
  
}

subsetdfr

this result in:

> subsetdfr
     V1    V2   V3   V4    V5     V6
1  5590 23658  523  727 52903 732569
2 24311   421 4319 5597 32695   4521

Upvotes: 0

akrun

Reputation: 887501

We can use apply

df1$G  <- apply(df1[-1], 1, function(x) intersect(x, c(5590, 421))[1])

Upvotes: 1

Complex subsetting of data

Answers (3)

Related Questions