Kumar
Kumar

Reputation: 188

dealing with dataframes in a list to subset rows using a condition and another dataframe using R

i have a list with multiple dataframes i.e. 'mylist' and a dataframe i.e. 'mydf'. with these two, I need to solve two problems to solve using R

the actual list contains many dataframes and the actual dataframe contains 10000 rows. here only the sample data is shown

first problem: I have a list with multiple data frames. the following list is an example

mylist1 <- list(a = data.frame(ID = c("a_1", "b_1", "c_1", "d_1", "e_1", "f_1"), colb = c(3.67, 4.94, 8.11, 2.85, 9.53, 7.5), colc = c(3.45, 6.19, 4.96, 6.73, 9.26, 8.62)), 
       b = data.frame(cola = c("a_1", "b_1", "c_1", "d_1", "e_1", "f_1"), colb = c(5.24, 3.62, 0.29, 6.65, 7.86, 8.7), colc = c(7.03, 7.51, 0.842, 3.56, 8.68, 5.844)))

I would like to subset rows in each data frame of the list using a condition say here based on values in column 'colc', if the values in column 'colc' are >= 6, I would like to subset rows in each data frame of the list

the expected output-1 from mylist1 is as follows...

mylistoutput <- list(a = data.frame(ID = c("b_1", "d_1", "e_1", "f_1"), colb = c(4.94, 2.85, 9.53, 7.5), colc = c(6.19, 6.73, 9.26, 8.62)), 
       b = data.frame(cola = c("a_1", "b_1", "e_1"), colb = c(5.24, 3.62, 7.86), colc = c(7.03, 7.51, 8.68)))

I tried to subset the rows using the condition with filter/subset as follows

mylistoutput <- lapply(mylist, function(x) filter(x$colc >= 6))

but failed.....

second problem: from the 'mylistoutput', I would like to do two things

first, with the first dataframe of 'mylistoutput', I would like to match the id's in 'ID'th column in 'mylistoutput' with the id's in the dataframe of 'mydf'

the dataframe 'mydf' sample is as follows

mydf <- data.frame(ID = c("a_1","a_1","a_1","a_1","a_1", "b_1","b_1","b_1","b_1", "c_1","c_1","c_1", "d_1","d_1","d_1", "e_1","e_1","e_1","e_1","e_1", "f_1","f_1","f_1","g_1","g_1","g_1","g_1","g_1"), colb = c(3.67,1,2.3,2.5,5, 1.1,2.2,3.7,4.94, 8.11,1.23,2, 2.85,1,2, 5,4,9.53,4,5, 8,7,7.5, 1,2,3,4,5), colc = c(3.45,1,2,3,4, 6.19,1,2,3, 4.96,1,2, 6.73,1,2, 9.26,1,2,3,4, 8.62,1,2, 1,2,3,4,5))

now, I would like to extract all the matched id's between first dataframe in 'mylistoutput' and 'mydf'

the expected output from 'mydf' is as follows

 mydfoutput1 <- data.frame(ID = c("b_1","b_1","b_1","b_1", "d_1","d_1","d_1", "e_1","e_1","e_1","e_1","e_1", "f_1","f_1","f_1"), colb = c(1.1,2.2,3.7,4.94, 2.85,1,2, 5,4,9.53,4,5, 8,7,7.5), colc = c(6.19,1,2,3, 6.73,1,2, 9.26,1,2,3,4, 8.62,1,2))

second, I would like to select the matching id's among various dataframes in the list 'mylistoutput'. for instance, "b_1" and "e_1" are the common id's in both the dataframes of the list 'mylistoutput'. then, I would like to subset the same id's i.e. "b_1", and "e_1" from the dataframe 'mydf'

the expected output is as follows

mydfoutput2 <- data.frame(ID = c("b_1","b_1","b_1","b_1", "e_1","e_1","e_1","e_1","e_1"), colb = c(1.1,2.2,3.7,4.94, 5,4,9.53,4,5), colc = c(6.19,1,2,3, 9.26,1,2,3,4, ))

looking for the code to solve the above problem

Upvotes: 1

Views: 58

Answers (1)

akrun
akrun

Reputation: 887891

We can use lapply with subset

out <- lapply(mylist1, subset, subset = colc >=6)

For the second case, we can do

subset(mydf, ID %in% out[[1]]$ID)

For the third case, use Reduce with intersect

subset(mydf, ID %in% Reduce(intersect, lapply(out, `[[`, 1)))

filter is from dplyr and it requires a data.frame as input and not a vector

lapply(mylist, function(x) filter(x, colc >= 6))

Upvotes: 3

Related Questions