Gabriel G.
Gabriel G.

Reputation: 864

Bind dataframes in a list two by two (or by name) - R

Lets say I have this list of dataframes:

  DF1_A<- data.frame (first_column  = c("A", "B","C"),
                    second_column = c(5, 5, 5),
                    third_column = c(1, 1, 1)
)

DF1_B <- data.frame (first_column  = c("A", "B","E"),
                     second_column = c(1, 1, 5),
                     third_column = c(1, 1, 1)
)

DF2_A <- data.frame (first_column  = c("E", "F","G"),
                     second_column = c(1, 1, 5),
                     third_column = c(1, 1, 1)
)

DF2_B <- data.frame (first_column  = c("K", "L","B"),
                     second_column = c(1, 1, 5),
                     third_column = c(1, 1, 1)
)

mylist <- list(DF1_A, DF1_B, DF2_A, DF2_B)
names(mylist) = c("DF1_A", "DF1_B", "DF2_A", "DF2_B")


mylist =  lapply(mylist, function(x){
  x[, "first_column"] <- as.character(x[, "first_column"])
  x
})

I want to bind them by their name (All DF1, All DF2 etc), or, objectively, two by two in this ordered named list. Keeping the "named list structure" of the list is important to keep track (for example, DF1_A and DF1_B = DF1 or something similiar in the names(mylist))

There are some rows that have duplicated values, and I want to keep them (which will introduce some duplicated characters such as first_column, value A)

I have tried finding any clues here on stack overflow, but most people want to bind dataframes irrespective of their names or orders.

Final result would look something like this:

mylist
DF1
DF2

DF1
first_column    second_column   third_column
A               1               1
A               5               1
B               1               1
B               5               1
C               5               1
E               5               1

Upvotes: 2

Views: 70

Answers (3)

user10917479
user10917479

Reputation:

One of many obligatory tidyverse solutions can be this.

library(purrr)
library(stringr)

# find the unique DF names
unique_df <- set_names(unique(str_split_fixed(names(mylist), "_", 2)[,1]))

# loop over each unique name, extracting the elements and binding into columns
purrr::map(unique_df, ~ keep(mylist, str_starts(names(mylist), .x))) %>% 
  map(bind_rows)

Also for things like this, bind_rows() from dplyr has a .id argument which will add a column with the list element name, and stack the rows. That can also be a helpful way. You can bind, manipulate the name how you'd like, and then split().

Upvotes: 0

ThomasIsCoding
ThomasIsCoding

Reputation: 102181

Do you mean something like this?

lapply(
  split(mylist, gsub("_.*", "", names(mylist))),
  function(v) `row.names<-`((out <- do.call(rbind, v))[do.call(order, out), ], NULL)
)

which gives

$DF1
  first_column second_column third_column
1            A             1            1
2            A             5            1
3            B             1            1
4            B             5            1
5            C             5            1
6            E             5            1

$DF2
  first_column second_column third_column
1            B             5            1
2            E             1            1
3            F             1            1
4            G             5            1
5            K             1            1
6            L             1            1

Upvotes: 3

Rui Barradas
Rui Barradas

Reputation: 76495

Here is a solution with Map, but it only works for two suffixes. If you want to merge, use the first Map instruction; if you want to keep duplicates, use the 2nd, rbind solution.

sp <- split(mylist, sub("^DF.*_", "", names(mylist)))
res1 <- Map(function(x, y)merge(x, y, all = TRUE), sp[["A"]], sp[["B"]])
res2 <- Map(function(x, y)rbind(x, y), sp[["A"]], sp[["B"]])

names(res1) <- sub("_.*$", "", names(res1))
names(res2) <- sub("_.*$", "", names(res2))

Upvotes: 1

Related Questions