lel
lel

Reputation: 163

keeping certain columns in R

I have a list of dataframes, and I want to loop through them and keep the rows that match with certain values of certain columns names, I want to pass vector of those columns names and their corresponding values.

Example:

DF1 =
x   y
10  s
5   h

DF2 =

x  z   y
11  h   h
5   s  s
5   h  s

So I want to loop through those dataframes, and keep any rows with (x,y) column names that has values of (5,s).That's just an example, I want to generlize my code.

I'm thinking of this but it sure is not working like this:

Data-Mining = sapply(DFlist,)

I appreciate the help.

Upvotes: 2

Views: 2667

Answers (2)

thelatemail
thelatemail

Reputation: 93908

What about something like this, relying on merge to keep the rows. Might be easier than writing a selection statement if you have many variables to match. I've added an extra dataset that has neither an x or y variable present to show how this accounts for that issue.

DF1 <- data.frame(x=c(10,5), y=c('s','h'))
DF2 <- data.frame(x=c(11,5,5), z=c('h', 's', 'h'), y = c('h','s','s'))
DF3 <- data.frame(a=1:3,b=2:4)

vals <- list(5, "s")
nams <- c("x","y")
lapply(list(DF1,DF2,DF3), function(DAT) {
  DAT[setdiff(nams, names(DAT))] <- NA
  merge(DAT, setNames(vals,nams), by=nams)
})

#[[1]]
#[1] x y
#<0 rows> (or 0-length row.names)
#
#[[2]]
#  x y z
#1 5 s s
#2 5 s h
# 
#[[3]]
#[1] x y a b
#<0 rows> (or 0-length row.names)

Upvotes: 1

Tad Dallas
Tad Dallas

Reputation: 1189

Below, I recreate the two data.frame objects you provide in your example, and then use lapply, and two functions from the dplyr package to select and filter based on your desired output. Select subsets the columns of interest, and filter selects the rows that meet some logical criterion/criteria.

library(dplyr)

DF1 = data.frame(x=c(10,5), y=c('s','h'))

DF2 = data.frame(x=c(11,5,5), z=c('h', 's', 'h'), y = c('h','s','s'))


DFlist <- list(DF1, DF2)



colsKeep <- c('x', 'y')
xRange <- 1:5
yVal <- 's'

lapply(DFlist, function(x){x %>% 
                             select(one_of(colsKeep)) %>% 
                             filter(x %in% xRange & y == yVal)})

edit: I now specify the columns to keep, and the value(s) that I'll accept in the subset.

Upvotes: 1

Related Questions