maurice vergeer
maurice vergeer

Reputation: 85

extracting a dataframe from a list over many objects

I have over a 1000 objects (z) in R, each containing three dataframes (df1, df2, df3) with different structures.

z1$df1z1000$df1

z1$df2z1000$df2

z1$df3z1000$df3

I created a list of these objects (list1 thus contains z1 thru z1000) and tried to use lapply to extract one type of dataframe (df2) for all objects, and then merge them to one single dataframe.

Extraction:

For a single object it would look like this:

df15<- z15$df2 # I transferred the index of z to the extracted df

I tried some code with lapply, ignoring the transfer of the index (I can create another list for that). However I don’t know what function I should use.

List2 <- lapply(list1, function(x))

I try to avoid using a loop because there's so many and vectorization is so much quicker. I have the idea I'm looking at it from the wrong angle.

Subsequent merging can be done as follows:

merged <- do.call(rbind, list2)

Thanks for any suggestions.

Upvotes: 2

Views: 4686

Answers (4)

C8H10N4O2
C8H10N4O2

Reputation: 19025

THere's also data.table::rbindlist, which is likely faster than do.call(rbind, lapply(...)) or dplyr::bind_rows

library(data.table)
rbindlist(lapply(list1, "[[", "df2"))

Upvotes: 0

MKR
MKR

Reputation: 20095

One option could be using lapply to extract data.frame and then use bind_rows from dplyr.

## The data
df1 <- data.frame(id = c(1:10), name = c(LETTERS[1:10]), stringsAsFactors = FALSE)
df2 <- data.frame(id = 11:20, name = LETTERS[11:20], stringsAsFactors = FALSE)
df3 <- data.frame(id = 21:30, name = LETTERS[15:24], stringsAsFactors = FALSE)
df4 <- data.frame(id = 121:130, name = LETTERS[15:24], stringsAsFactors = FALSE)

z1 <- list(df1 = df1, df2 = df2, df3 = df3)
z2 <- list(df1 = df1, df2 = df2, df3 = df3)
z3 <- list(df1 = df1, df2 = df2, df3 = df3)
z4 <- list(df1 = df1, df2 = df2, df3 = df4) #DFs can contain different data

# z <- list(z1, z2, z3, z4)
# Dynamically populate list z with many list object
z <- as.list(mget(paste("z",1:4,sep="")))


df1_all <- bind_rows(lapply(z, function(x) x$df1))
df2_all <- bind_rows(lapply(z, function(x) x$df2))
df3_all <- bind_rows(lapply(z, function(x) x$df3))


## Result for df3_all
> tail(df3_all)
##    id name
## 35 125    S
## 36 126    T
## 37 127    U
## 38 128    V
## 39 129    W
## 40 130    X

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 270160

Try this:

lapply(list1, "[[", "df2")

or if you want to rbind them together:

do.call("rbind", lapply(list1, "[[", "df2"))

The row names in the resulting data frame will identify the origin of each row.

No packages are used.

Note

We can use this input to test the code above. BOD is a built-in data frame:

z <- list(df1 = BOD, df2 = BOD, df3 = BOD)
list1 <- list(z1 = z, z2 = z)

Upvotes: 1

Paul
Paul

Reputation: 9107

It sounds like you want to pull out all the df1s and rbind them together then do the same for the other dataframes. You can use purrr::map_dfr to extract a column from each element of the list and rowbind them together.

library('tidyverse')

dummy_df <- list(
  df1 = iris,
  df2 = cars,
  df3 = CO2)

list1 <- list(
  z1 = dummy_df,
  z2 = dummy_df,
  z3 = dummy_df)

df1 <- map_dfr(list1, 'df1')
df2 <- map_dfr(list1, 'df2')
df3 <- map_dfr(list1, 'df3')

If you wanted to do it in base R, you can use lapply.

df1 <- lapply(list1, function(x) x$df1)
df1_merged <- do.call(rbind, df1)

Upvotes: 2

Related Questions