Reputation: 1
I have a list containing nine dataframes (called data
), each of varying lengths and contents. Consistent across most of them, though, are columns containing information that I want to store in a separate dataframe for later use.
These columns are the following:
identifiers <- c("Organism Name", "Protein names", "Gene names", "Pathway", "Biological Process")
I want to iterate through through each element of data
to check if it contains the columns I'm interested in, then subset these columns as separate dataframes.
I first tried
lapply(data, '[', identifiers]
The problem with this is that not all of the dfs contain all of the identifiers listed above, so running this returns 'undefined columns selected'.
My next attempt was
lapply(data, function(x) if(identifiers %in% x) '[', identifiers)
which returned a list of 9 (corresponding to the 9 original dataframes) of class NULL. I think that this general method would work with proper execution, but I just can't figure it out.
Any help would be appreciated :)
Upvotes: 0
Views: 665
Reputation: 160792
Since identifiers
is a vector of column names, some or all of which may be in each frame, we can do:
lapply(data, function(x) x[,intersect(names(x), identifiers),drop=FALSE])
with the understanding that some elements may have zero columns (if none are found).
Your use of if (identifiers %in% x)
is not quite right for two reasons:
identifiers %in% x
is looking for presence in the data, not in the names, it should be identifiers %in% names(x)
; and
if
requires exactly one logical, but identifiers %in% names(x)
is going to return a logical vector the same length as identifiers
(i.e., not one). It needs to be summarized.
If it is true that if any of the columns are found, then you will always have all of them, then you can change my code above to be
lapply(data, function(x) if (all(identifiers %in% names(x))) data[,identifiers])
and frames without those columns will return NULL
. My use above of intersect
also works in this regard, the functional difference being in the case where a frame contains some but not all of them. Over to you which logic you prefer.
Upvotes: 2