Maximilian
Maximilian

Reputation: 4229

Select specific names from list of dataframes in R

Sample data:

df <- data.frame(names=letters[1:10],name1=rnorm(10,1,1),name2=rexp(10,2))

list <- list(df,df)

vec_name <- c("f","i","c") # desired row names 

I would like to select per list rows given the vec_name names:

Desired outcome:

[[1]]
      names      value1    value2
   6   nd:f   -1.6323952 0.3117470
   9   nd:i    1.8270855 0.2475741
   3   nd:c    0.6978422 0.4695581   # the ordering does matter; must be as seen in vec_name

[[2]]
      names      value1    value2
   6   ad:f   -1.6323952 0.3117470
   9   ad:i    1.8270855 0.2475741
   3   ad:c    0.6978422 0.4695581

Desired output 2: Is in dataframe, which would be I believe just do.call(rbind,list):

However the clean names from vec_names should be used instead.

      names      value1    value2
   1      f   -1.6323952 0.3117470
   2      i    1.8270855 0.2475741
   3      c    0.6978422 0.4695581 
   4      f   -1.6323952 0.3117470
   5      i    1.8270855 0.2475741
   6      c    0.6978422 0.4695581

I have tried sapply; lapply ... for example:

lapply(list, function(x) x[grepl(vec_name,x$names),])

EDIT : PLEASE SEE THE EDITED QUESTION ABOVE.

Upvotes: 0

Views: 366

Answers (2)

thothal
thothal

Reputation: 20329

You were almost there. The warning message was saying:

Warning messages:
1: In grepl(vec_name, x$names) :
   argument 'pattern' has length > 1 and only the first element will be used

Reason is that you provide a vector to grepl which is expecting a regex (see ?regex). What you want to do is to match the contents:

lapply(list, function(x) x[match(vec_name,x$names),])

Which will give you a list of data.frame objects. If you want to combine them afterwards just use:

do.call(rbind, lapply(list, function(x) x[match(vec_name,x$names),]))

Or you use ldply from library(plyr):

library(plyr)
ldply(list, function(x) x[match(vec_name,x$names),])
#   names       name1     name2
# 1     f  2.01421228 0.4489627
# 2     i  0.28899891 0.8323940
# 3     c -0.01746007 1.5309936
# 4     f  2.01421228 0.4489627
# 5     i  0.28899891 0.8323940
# 6     c -0.01746007 1.5309936

And as a remark: avoid to use protected names like list for your variables to avoid unwanted effects.

Update

Taking the comments into account (vec_name does not match completely the names in the data.frame)you should clean first the names and then do the match. This is, however, assuming that your 'uncleaned' names contain the cleaned names with a pre-fix separated by a colon (':') (if this is not the case adapt the regex in the gsub statement):

ldply(list, function(x) x[match(vec_name, gsub(".*:(.*)", "\\1", x$names)),])

Upvotes: 1

Cath
Cath

Reputation: 24074

for the first output :

output1<-lapply(list,function(elt){
                       resmatch<-sapply(vec_name,function(x) regexpr(x,df$names))
                       elt<-elt[apply(resmatch,2,function(rg) which(rg>0)),]
                       colnames(elt)<-c("names","value1","value2")
                       return(elt)
                       })

>output1
[[1]]
  names     value1    value2
6  nd:f -0.2132962 0.7618105
9  nd:i -0.6580247 0.6010379
3  nd:c  0.9302625 0.1490061

[[2]]
  names     value1    value2
6  nd:f -0.2132962 0.7618105
9  nd:i -0.6580247 0.6010379
3  nd:c  0.9302625 0.1490061

For the second output, you can do what you wanted to :

output2<-do.call(rbind,output1)

> output2

   names     value1    value2
6   nd:f -0.2132962 0.7618105
9   nd:i -0.6580247 0.6010379
3   nd:c  0.9302625 0.1490061
61  nd:f -0.2132962 0.7618105
91  nd:i -0.6580247 0.6010379
31  nd:c  0.9302625 0.1490061

Upvotes: 1

Related Questions