Reputation: 7409
Given two dataframes a
and b
:
> a
a b c
1 -0.2246894 -1.48167912 -1.65099363
2 0.5559320 -0.87898575 -0.15634590
3 1.8469466 -0.01487524 -0.53098215
4 -0.6875051 0.23880967 0.01824621
5 -0.6735163 0.75485292 0.44154092
> b
a c
1 0.4287284 -0.3295925
2 0.5201492 0.3341251
3 -2.6355570 1.7916780
4 -1.3645337 1.3642276
5 -0.4954542 -0.6660001
Is there a simple way to concatenate these so as to return a new data frame of the form below?
> new
a b c
1 -0.2246894 -1.48167912106676 -1.65099363
2 0.5559320 -0.878985746842256 -0.15634590
3 1.8469466 -0.0148752354840942 -0.53098215
4 -0.6875051 0.238809666690982 0.01824621
5 -0.6735163 0.754852923524198 0.44154092
6 0.4287284 NA -0.32959248
7 0.5201492 NA 0.33412510
8 -2.6355570 NA 1.79167801
9 -1.3645337 NA 1.36422764
10 -0.4954542 NA -0.66600006
I want to merge the dataframes, match the headers and insert NA
in for positions in dataframe b
where the header is missing.
Upvotes: 173
Views: 441292
Reputation: 678
Use merge()
from the base package
First create the two dataframes, named a
and b
.
a <- data.frame(a = c(-0.2246894, 0.5559320, 1.8469466, -0.6875051, -0.6735163),
b = c(-1.48167912, -0.87898575, -0.01487524, 0.23880967, 0.75485292),
c = c(-1.65099363, -0.15634590, -0.53098215, 0.01824621, 0.44154092))
b <- data.frame(a = c(0.4287284, 0.5201492, -2.6355570, -1.3645337, -0.4954542),
c = c(-0.3295925, 0.3341251, 1.7916780, 1.3642276, -0.6660001))
Now let's merge the two dataframes together using the merge()
function from the base package.
merge(a, b, all = TRUE, sort = FALSE)[c("a","b","c")]
all = TRUE
option is needed here when you have values of a variable that are not known. This option will populate missing values with NA
.sort = FALSE
option says not to reorder the rows.[c("a","b","c")]
is some basic indexing to reorder the columns of the dataframe which merge() orders by default in this particular case as [c("a","c","b")]
.If you are not too picky about the ordering of the rows and columns, you can simply use:
merge(a, b, all = TRUE)
Upvotes: 1
Reputation: 900
you can use the function
bind_rows(a,b)
from the dplyr library
Upvotes: 37
Reputation: 1457
Here's a simple little function that will rbind two datasets together after auto-detecting what columns are missing from each and adding them with all NA
s.
For whatever reason this returns MUCH faster on larger datasets than using the merge
function.
fastmerge <- function(d1, d2) {
d1.names <- names(d1)
d2.names <- names(d2)
# columns in d1 but not in d2
d2.add <- setdiff(d1.names, d2.names)
# columns in d2 but not in d1
d1.add <- setdiff(d2.names, d1.names)
# add blank columns to d2
if(length(d2.add) > 0) {
for(i in 1:length(d2.add)) {
d2[d2.add[i]] <- NA
}
}
# add blank columns to d1
if(length(d1.add) > 0) {
for(i in 1:length(d1.add)) {
d1[d1.add[i]] <- NA
}
}
return(rbind(d1, d2))
}
Upvotes: 14
Reputation: 21459
You want "rbind".
b$b <- NA
new <- rbind(a, b)
rbind requires the data frames to have the same columns.
The first line adds column b to data frame b.
Results
> a <- data.frame(a=c(0,1,2), b=c(3,4,5), c=c(6,7,8))
> a
a b c
1 0 3 6
2 1 4 7
3 2 5 8
> b <- data.frame(a=c(9,10,11), c=c(12,13,14))
> b
a c
1 9 12
2 10 13
3 11 14
> b$b <- NA
> b
a c b
1 9 12 NA
2 10 13 NA
3 11 14 NA
> new <- rbind(a,b)
> new
a b c
1 0 3 6
2 1 4 7
3 2 5 8
4 9 NA 12
5 10 NA 13
6 11 NA 14
Upvotes: 271
Reputation: 69
You may use rbind
but in this case you need to have the same number of columns in both tables, so try the following:
b$b<-as.double(NA) #keeping numeric format is essential for further calculations
new<-rbind(a,b)
Upvotes: 5