R: Select columns from a list of dataframes while some columns do not exist in few dataframes

Question

I have a list of dataframes, which have only few columns in common. I have a vector of columns I wish to keep. But, some dataframes have exactly those columns, some are missing few of them.

If every dataframe would contain the same columns, I would simply use subset(df, select = c("column", "names")) to select my column of interest. But how can I select only the columns that exist?

I have a dummy example, but I wish to use map or lapply functions on a list, as I have many dataframes in my real data.

My dummy example:

df1<- data.frame(a  = seq(0,5),
                 b  = seq(5,10),
                 cc = seq(10,15))

df2<- data.frame(a  = seq(0,5),
                 b  = seq(5,10),
                 d = seq(10,15))


ls <-list(df1, df2)

# select columns,  "cc" column is missing from df2
keep<-c("b", "cc")

How to modify this function to select only the columns which exist in a dataframe?

lapply(ls, function(x) subset(x, select = keep) )

Expected output with uneven column number:

MRau · Accepted Answer

You can use the intersect function:

> intersect(c("a", "b", "c"), c("a", "b"))
[1] "a" "b"

I.e. modify your function like this:

> lapply(ls, function(x) subset(x, select = intersect(keep, colnames(x))))
[[1]]
   b cc
1  5 10
2  6 11
3  7 12
4  8 13
5  9 14
6 10 15

[[2]]
   b
1  5
2  6
3  7
4  8
5  9
6 10

R: Select columns from a list of dataframes while some columns do not exist in few dataframes

Answers (1)

Related Questions