Reputation: 4080
I have a list of dataframes, which have only few columns in common. I have a vector of columns I wish to keep. But, some dataframes have exactly those columns, some are missing few of them.
If every dataframe would contain the same columns, I would simply use subset(df, select = c("column", "names"))
to select my column of interest. But how can I select only the columns that exist?
I have a dummy example, but I wish to use map
or lapply
functions on a list, as I have many dataframes in my real data.
My dummy example:
df1<- data.frame(a = seq(0,5),
b = seq(5,10),
cc = seq(10,15))
df2<- data.frame(a = seq(0,5),
b = seq(5,10),
d = seq(10,15))
ls <-list(df1, df2)
# select columns, "cc" column is missing from df2
keep<-c("b", "cc")
How to modify this function to select only the columns which exist in a dataframe?
lapply(ls, function(x) subset(x, select = keep) )
Expected output with uneven column number:
[[1]]
b cc
1 5 10
2 6 11
3 7 12
4 8 13
5 9 14
6 10 15
[[2]]
b
1 5
2 6
3 7
4 8
5 9
6 10
Upvotes: 2
Views: 1815
Reputation: 336
You can use the intersect
function:
> intersect(c("a", "b", "c"), c("a", "b"))
[1] "a" "b"
I.e. modify your function like this:
> lapply(ls, function(x) subset(x, select = intersect(keep, colnames(x))))
[[1]]
b cc
1 5 10
2 6 11
3 7 12
4 8 13
5 9 14
6 10 15
[[2]]
b
1 5
2 6
3 7
4 8
5 9
6 10
Upvotes: 3