Dean
Dean

Reputation: 119

Merge columns of data in R

I have a situation where I have two different datasets from different time periods. While many of the columns are the same, some are not. These "different" columns are ancillary data which should be retained.

I want to combine these two datasets, leaving NA where the value does not exist. Note that these two tables do not have any foreign keys, and should not be joined, just stacked together.

Example:

data.a <- structure(list(lat = c(50L, -30L, -33L), lon = c(-40L, -30L, 
-50L), mag = c(3.5, 1.1, 2.3)), .Names = c("lat", "lon", "mag"
), class = "data.frame", row.names = c(NA, -3L))

data.b <- structure(list(lat = c(44L, -30L, -33L), lon = c(-40L, -30L, 
-55L), elev = c(1050, 2.4, 2.3)), .Names = c("lat", "lon", "elev"
), class = "data.frame", row.names = c(NA, -3L))

data.a
lat lon mag
 50 -40 3.5
-30 -30 1.1
-33 -50 2.3

data.b
lat lon   elev
 44 -40 1050.0
-30 -30    2.4
-33 -55    2.3

Expected result:

lat lon mag   elev
 50 -40 3.5     NA
-30 -30 1.1     NA
-33 -50 2.3     NA
 44 -40  NA 1050.0
-30 -30  NA    2.4
-33 -55  NA    2.3

What is the best way to achieve this result? I see plenty of utilities to perform joins, but I am unsure how to do this operation.

Upvotes: 3

Views: 141

Answers (3)

hrbrmstr
hrbrmstr

Reputation: 78832

or with the more modern dplyr:

dplyr::bind_rows(data.a, data.b)

## Source: local data frame [6 x 4]
## 
##     lat   lon   mag   elev
##   (int) (int) (dbl)  (dbl)
## 1    50   -40   3.5     NA
## 2   -30   -30   1.1     NA
## 3   -33   -50   2.3     NA
## 4    44   -40    NA 1050.0
## 5   -30   -30    NA    2.4
## 6   -33   -55    NA    2.3

or data.table:

data.table::rbindlist(list(data.a, data.b), fill=TRUE)

##    lat lon mag   elev
## 1:  50 -40 3.5     NA
## 2: -30 -30 1.1     NA
## 3: -33 -50 2.3     NA
## 4:  44 -40  NA 1050.0
## 5: -30 -30  NA    2.4
## 6: -33 -55  NA    2.3

or gtools:

gtools::smartbind(data.a, data.b)

##     lat lon mag   elev
## 1:1  50 -40 3.5     NA
## 1:2 -30 -30 1.1     NA
## 1:3 -33 -50 2.3     NA
## 2:1  44 -40  NA 1050.0
## 2:2 -30 -30  NA    2.4
## 2:3 -33 -55  NA    2.3

or, if you don't want to use any packages:

rbind_fill <- function(...) {
  dfs <- list(...)
  cols <- Reduce(union, lapply(dfs, colnames))
  do.call(rbind, lapply(dfs, function(df) {
    df_cols <- setdiff(cols, colnames(df))
    df[,df_cols] <- NA
    df
  }))
}

rbind_fill(data.a, data.b)

##   lat lon mag   elev
## 1  50 -40 3.5     NA
## 2 -30 -30 1.1     NA
## 3 -33 -50 2.3     NA
## 4  44 -40  NA 1050.0
## 5 -30 -30  NA    2.4
## 6 -33 -55  NA    2.3

Upvotes: 7

Randy Zwitch
Randy Zwitch

Reputation: 2064

library(dplyr)
result <- bind_rows(data.a, data.b)

> result
Source: local data frame [6 x 4]

  lat lon mag   elev
1  50 -40 3.5     NA
2 -30 -30 1.1     NA
3 -33 -50 2.3     NA
4  44 -40  NA 1050.0
5 -30 -30  NA    2.4
6 -33 -55  NA    2.3

Upvotes: 4

lroha
lroha

Reputation: 34621

Try:

library(plyr)
rbind.fill(data.a, data.b)

  lat lon mag   elev
1  50 -40 3.5     NA
2 -30 -30 1.1     NA
3 -33 -50 2.3     NA
4  44 -40  NA 1050.0
5 -30 -30  NA    2.4
6 -33 -55  NA    2.3

Upvotes: 6

Related Questions