Tom
Tom

Reputation: 2341

Merging datasets of which only some columns match

I want to merge about 27 different datasets. For each of these datasets there are many identical columns, but also many columns that are not identical. I would like the identical columns to "stack" while the non-identical columns are simply added.

Let's assume that datasets 1 and 2 look as follows:

library(data.table)
df1 <- fread(
    "A   B   C  iso   year   
     0   1   1  NLD   2009   
     1   0   2  NLD   2009   
     0   0   3  AUS   2011   
     1   0   4  AUS   2011   
     0   1   7  NLD   2008   
     1   0   1  NLD   2008   
     0   0   3  AUS   2012",
  header = TRUE
)
df2 <- fread(
    "A   B   D  E  iso   year   
     0   1   1  NA ECU   2009   
     1   0   2  0  ECU   2009   
     0   0   3  0  BRA   2011   
     1   0   4  0  BRA   2011   
     0   1   7  NA ECU   2008   
     1   0   1  0  ECU   2008   
     0   0   3  2  BRA   2012   
     1   0   4  NA BRA   2012",
  header = TRUE
)

I would now like to put these df's together with:

rbind(df1, df2, fill=TRUE)

The merged dataset should look as follows:

df_merged <- fread(
    "A   B   C  D   E   iso   year   
     0   1   1  NA  NA  NLD   2009   
     1   0   2  NA  NA  NLD   2009   
     0   0   3  NA  NA  AUS   2011   
     1   0   4  NA  NA  AUS   2011   
     0   1   7  NA  NA  NLD   2008   
     1   0   1  NA  NA  NLD   2008   
     0   0   3  NA  NA  AUS   2012   
     0   1   NA  1  NA  ECU   2009   
     1   0   NA  2  0   ECU   2009   
     0   0   NA  3  0   BRA   2011   
     1   0   NA  4  0   BRA   2011   
     0   1   NA  7  NA  ECU   2008   
     1   0   NA  1  0   ECU   2008   
     0   0   NA  3  2   BRA   2012   
     1   0   NA  4  NA  BRA   2012",",
   header = TRUE
)

However for some reason, when applying this to the actual data I get the error:

    Error in rbind(deparse.level, ...) : 
      numbers of columns of arguments do not match

I have tried to recreate the error in my example (by making the amount of columns and rows different for the df's) but failed to do so.

Any ideas?

EDIT: So apparently the issue was that the classes in my two datasets were not identical in some occasions where the column names were. I will post an update after I got it to work.

Upvotes: 1

Views: 68

Answers (2)

Sotos
Sotos

Reputation: 51592

You can use rbindlist, i.e.

library(data.table)
rbindlist(list(df1, df2), fill = TRUE)

Upvotes: 2

Rushabh Patel
Rushabh Patel

Reputation: 2764

You can also use bind_rows function from dplyr package-

> bind_rows(df1,df2)

    A B  C iso year  D  E
 1: 0 1  1 NLD 2009 NA NA
 2: 1 0  2 NLD 2009 NA NA
 3: 0 0  3 AUS 2011 NA NA
 4: 1 0  4 AUS 2011 NA NA
 5: 0 1  7 NLD 2008 NA NA
 6: 1 0  1 NLD 2008 NA NA
 7: 0 0  3 AUS 2012 NA NA
 8: 0 1 NA ECU 2009  1 NA
 9: 1 0 NA ECU 2009  2  0
10: 0 0 NA BRA 2011  3  0
11: 1 0 NA BRA 2011  4  0
12: 0 1 NA ECU 2008  7 NA
13: 1 0 NA ECU 2008  1  0
14: 0 0 NA BRA 2012  3  2
15: 1 0 NA BRA 2012  4 NA

Upvotes: 1

Related Questions