Tom
Tom

Reputation: 65

Replacement of plyr::cbind.fill in dplyr?

I apologize if this question is elementary, but I've been scouring the internet and I can't seem to find a simple solution.

I currently have a list of R objects (named vectors or dataframes of 1 variable, I can work with either), and I want to join them into 1 large dataframe with 1 row for each unique name/rowname, and 1 column for each element in the original list.

My starting list looks something like:

l1 <- list(df1 = data.frame(c(1,2,3), row.names = c("A", "B", "C")), 
       df2 = data.frame(c(2,6), row.names = c("B", "D")),
       df3 = data.frame(c(3,6,9), row.names = c("C", "D", "A")),
       df4 = data.frame(c(4,12), row.names = c("A", "E")))

And I want the output to look like:

data.frame("df1" = c(1,2,3,NA,NA),
+            "df2" = c(NA,2,NA,6,NA),
+            "df3" = c(9,NA,3,6,NA),
+            "df4" = c(4,NA,NA,NA,12), row.names = c("A", "B", "C", "D", "E"))
  df1 df2 df3 df4
A   1  NA   9   4
B   2   2  NA  NA
C   3  NA   3  NA
D  NA   6   6  NA
E  NA  NA  NA  12

I don't mind if the fill values are NA or 0 (ultimately I want 0 but that's an easy fix).

I'm almost positive that plyr::cbind.fill does exactly this, but I have been using dplyr in the rest of my script and I don't think using both is a good idea. dplyr::bind_cols does not seem to work with vectors of different lengths. I'm aware a very similar question has been asked here: R: Is there a good replacement for plyr::rbind.fill in dplyr? but as I mentioned, this solution doesn't actually seem to work. Neither does dplyr::full_join, even wrapped in a do.call. Is there a straightforward solution to this, or is the only solution to write a custom function?

Upvotes: 5

Views: 4107

Answers (3)

tmfmnk
tmfmnk

Reputation: 39858

Yet another purrr and dplyr option could be:

l1 %>%
 map2_dfr(.x = ., .y = names(.), ~ setNames(.x, .y) %>%
           rownames_to_column()) %>%
 group_by(rowname) %>%
 summarise_all(~ ifelse(all(is.na(.)), NA, first(na.omit(.))))

  rowname   df1   df2   df3   df4
  <chr>   <dbl> <dbl> <dbl> <dbl>
1 A           1    NA     9     4
2 B           2     2    NA    NA
3 C           3    NA     3    NA
4 D          NA     6     6    NA
5 E          NA    NA    NA    12

Upvotes: 2

akrun
akrun

Reputation: 887291

We can convert the rownames to a column with rownames_to_column, then rename the second column, bind the list elements with bind_rows, and reshape to 'wide' with pivot_wider

library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
map_dfr(l1, ~ rownames_to_column(.x, 'rn') %>% 
              rename_at(2, ~'v1'), .id = 'grp') %>%        
   pivot_wider(names_from = grp, values_from = v1) %>% 
   column_to_rownames('rn')

Upvotes: 7

camille
camille

Reputation: 16842

Here's a way with some purrr and dplyr functions. Create column names to represent each data frame—since each has only one column, this is easy with setNames, but with more columns you could use dplyr::rename. Do a full-join across the whole list based on the original row names, and fill NAs with 0.

library(dplyr)
library(purrr)

l1 %>%
  imap(~setNames(.x, .y)) %>%
  map(tibble::rownames_to_column) %>%
  reduce(full_join, by = "rowname") %>%
  mutate_all(tidyr::replace_na, 0)
#>   rowname df1 df2 df3 df4
#> 1       A   1   0   9   4
#> 2       B   2   2   0   0
#> 3       C   3   0   3   0
#> 4       D   0   6   6   0
#> 5       E   0   0   0  12

Upvotes: 6

Related Questions