dwdionis
dwdionis

Reputation: 83

Modify data frames in list to have same # of rows

I'm trying to combine data frames (hundreds of them), but they have different numbers of rows.

df1 <- data.frame(c(7,5,3,4,5), c(43,56,23,78,89))

df2 <- data.frame(c(7,5,3,4,5,8,5), c(43,56,23,78,89,45,78))

df3 <- data.frame(c(7,5,3,4,5,8,5,6,7), c(43,56,23,78,89,45,78,56,67))

colnames(df1) <- c("xVar1","xVar2")

colnames(df2) <- c("yVar1","yVar2")

colnames(df3) <- c("zVar1","zVar2")

a1 <- list(df1,df2,df3)

a1 is what is my initial data actually looks like when I get it.

Now if I do:

b1 <- as.data.frame(a1) 

I get an error, because the # of rows is not the same in the data (this would work fine if the # of rows was the same).

How do I make the # of rows equal or work around this issue?

I would like to be able to merge the data in this way (here is a working example with the same # of rows):

df1b <- data.frame(c(7,5,3,4,5), c(43,56,23,78,89))

df2b <- data.frame(c(7,5,3,4,6), c(43,56,24,48,89))

df3b <- data.frame(c(7,5,3,4,5), c(43,56,23,78,89))

colnames(df1b) <- c("xVar1","xVar2")

colnames(df2b) <- c("yVar1","yVar2")

colnames(df3b) <- c("zVar1","zVar2")

a2 <- list(df1b,df2b,df3b)

b2 <- as.data.frame(a2)

Thanks!

Upvotes: 0

Views: 108

Answers (3)

23stacks1254
23stacks1254

Reputation: 389

One can add a key (row count as variable value in this case) to each dataframe then merge by the key.

# get list of dfs (should prob import data into a list of dfs instead)
list_df<-mget(ls(pattern = "df[0-9]"))


#add newcolumn -- "key"
list_df<-lapply(list_df, function(df, newcol) {

  df[[newcol]]<-seq(nrow(df))
  return(df)

}, "key")


#merge function
MergeAllf <- function(x, y){
  df <- merge(x, y, by= "key", all.x= T, all.y= T)
}

#pass list to merge funct
library(tidyverse)
data <- Reduce(MergeAllf, list_df)%>%
  select(key, everything())#reorder or can drop "key"


data
 key xVar1 xVar2 yVar1 yVar2 zVar1 zVar2
1   1     7    43     7    43     7    43
2   2     5    56     5    56     5    56
3   3     3    23     3    23     3    23
4   4     4    78     4    78     4    78
5   5     5    89     5    89     5    89
6   6    NA    NA     8    45     8    45
7   7    NA    NA     5    78     5    78
8   8    NA    NA    NA    NA     6    56
9   9    NA    NA    NA    NA     7    67

Upvotes: 1

NM_
NM_

Reputation: 1999

Solution 1

You can achieve this with rbindlist(). Note that the column names will be the column names of the first data frame in the list:

library(data.table)
b1 = data.frame(rbindlist(a1))
> b1
   xVar1 xVar2
1      7    43
2      5    56
3      3    23
4      4    78
5      5    89
6      7    43
7      5    56
8      3    23
9      4    78
10     5    89
11     8    45
12     5    78
13     7    43
14     5    56
15     3    23
16     4    78
17     5    89
18     8    45
19     5    78
20     6    56
21     7    67

Solution 2

Alternatively, you make all the columns have the same name, then bind by row:

b1 = lapply(a1, setNames, c("Var1","Var2"))

Now you can bind by rows:

b1 = do.call(dplyr::bind_rows, b1)
> b1
   Var1 Var2
1     7   43
2     5   56
3     3   23
4     4   78
5     5   89
6     7   43
7     5   56
8     3   23
9     4   78
10    5   89
11    8   45
12    5   78
13    7   43
14    5   56
15    3   23
16    4   78
17    5   89
18    8   45
19    5   78
20    6   56
21    7   67

Upvotes: 0

Julian_Hn
Julian_Hn

Reputation: 2141

cbind.fill from rowr provides functionality for this and fills missing elements with NA:

library(purrr)
library(rowr)
b1 <- purrr::reduce(a1,cbind.fill,fill=NA)

Upvotes: 1

Related Questions