Reputation: 119
I have a situation where I have two different datasets from different time periods. While many of the columns are the same, some are not. These "different" columns are ancillary data which should be retained.
I want to combine these two datasets, leaving NA where the value does not exist. Note that these two tables do not have any foreign keys, and should not be joined, just stacked together.
Example:
data.a <- structure(list(lat = c(50L, -30L, -33L), lon = c(-40L, -30L,
-50L), mag = c(3.5, 1.1, 2.3)), .Names = c("lat", "lon", "mag"
), class = "data.frame", row.names = c(NA, -3L))
data.b <- structure(list(lat = c(44L, -30L, -33L), lon = c(-40L, -30L,
-55L), elev = c(1050, 2.4, 2.3)), .Names = c("lat", "lon", "elev"
), class = "data.frame", row.names = c(NA, -3L))
data.a
lat lon mag
50 -40 3.5
-30 -30 1.1
-33 -50 2.3
data.b
lat lon elev
44 -40 1050.0
-30 -30 2.4
-33 -55 2.3
Expected result:
lat lon mag elev
50 -40 3.5 NA
-30 -30 1.1 NA
-33 -50 2.3 NA
44 -40 NA 1050.0
-30 -30 NA 2.4
-33 -55 NA 2.3
What is the best way to achieve this result? I see plenty of utilities to perform joins, but I am unsure how to do this operation.
Upvotes: 3
Views: 141
Reputation: 78832
or with the more modern dplyr
:
dplyr::bind_rows(data.a, data.b)
## Source: local data frame [6 x 4]
##
## lat lon mag elev
## (int) (int) (dbl) (dbl)
## 1 50 -40 3.5 NA
## 2 -30 -30 1.1 NA
## 3 -33 -50 2.3 NA
## 4 44 -40 NA 1050.0
## 5 -30 -30 NA 2.4
## 6 -33 -55 NA 2.3
or data.table
:
data.table::rbindlist(list(data.a, data.b), fill=TRUE)
## lat lon mag elev
## 1: 50 -40 3.5 NA
## 2: -30 -30 1.1 NA
## 3: -33 -50 2.3 NA
## 4: 44 -40 NA 1050.0
## 5: -30 -30 NA 2.4
## 6: -33 -55 NA 2.3
or gtools
:
gtools::smartbind(data.a, data.b)
## lat lon mag elev
## 1:1 50 -40 3.5 NA
## 1:2 -30 -30 1.1 NA
## 1:3 -33 -50 2.3 NA
## 2:1 44 -40 NA 1050.0
## 2:2 -30 -30 NA 2.4
## 2:3 -33 -55 NA 2.3
or, if you don't want to use any packages:
rbind_fill <- function(...) {
dfs <- list(...)
cols <- Reduce(union, lapply(dfs, colnames))
do.call(rbind, lapply(dfs, function(df) {
df_cols <- setdiff(cols, colnames(df))
df[,df_cols] <- NA
df
}))
}
rbind_fill(data.a, data.b)
## lat lon mag elev
## 1 50 -40 3.5 NA
## 2 -30 -30 1.1 NA
## 3 -33 -50 2.3 NA
## 4 44 -40 NA 1050.0
## 5 -30 -30 NA 2.4
## 6 -33 -55 NA 2.3
Upvotes: 7
Reputation: 2064
library(dplyr)
result <- bind_rows(data.a, data.b)
> result
Source: local data frame [6 x 4]
lat lon mag elev
1 50 -40 3.5 NA
2 -30 -30 1.1 NA
3 -33 -50 2.3 NA
4 44 -40 NA 1050.0
5 -30 -30 NA 2.4
6 -33 -55 NA 2.3
Upvotes: 4
Reputation: 34621
Try:
library(plyr)
rbind.fill(data.a, data.b)
lat lon mag elev
1 50 -40 3.5 NA
2 -30 -30 1.1 NA
3 -33 -50 2.3 NA
4 44 -40 NA 1050.0
5 -30 -30 NA 2.4
6 -33 -55 NA 2.3
Upvotes: 6