Reputation: 91
My data.frame DATA
is
k l g
1 A 2004 12
2 B 2004 3.4
3 C 2004 4.5
Another data.frame DATA2
is
i d t
1 A 2012 22
2 B 2012 4.8
3 C 2012 5.6
I want to get
1 A 2004 12
1 A 2012 22
2 B 2004 3.4
2 B 2012 4.8
3 C 2004 4.5
3 C 2012 5.6
Upvotes: 8
Views: 4018
Reputation: 1059
Solution using dplyr
without the need for a key column and without modifying the original order (if the data frames weren't sorted by an "id" column).
library(dplyr)
stopifnot((N <- nrow(DATA)) == nrow(DATA2))
INTER_DATA <- setNames(DATA2, names(DATA)) %>%
bind_rows(DATA, .) %>%
arrange(rep(seq_len(N), length = n()))
Result:
k l g
1 A 2004 12.0
2 A 2012 22.0
3 B 2004 3.4
4 B 2012 4.8
5 C 2004 4.5
6 C 2012 5.6
Upvotes: 0
Reputation: 4187
You can also do this in base R with rbind
without the need to use extra packages, but you will have to set the columnnames of df2
the same as the columnnames in df1
:
colnames(df2) <- colnames(df1) # or: setNames(df2, colnames(df1))
new.df <- rbind(df1,df2)
new.df <- new.df[order(new.df$k),]
This will result in the following dataframe:
> new.df
k l g
1 A 2004 12.0
11 A 2012 22.0
2 B 2004 3.4
21 B 2012 4.8
3 C 2004 4.5
31 C 2012 5.6
Upvotes: 3
Reputation: 193637
You can try the interleave
function from the "gdata" package. However, this would require your inputs have the same column names and have the same number of rows.
The approach would be:
library(gdata) # for interleave
do.call(interleave, lapply(list(df1, df2), setNames, paste0("V", 1:ncol(df1))))
# V1 V2 V3
# 1 A 2004 12.0
# 11 A 2012 22.0
# 2 B 2004 3.4
# 21 B 2012 4.8
# 3 C 2004 4.5
# 31 C 2012 5.6
Alternatively, as mentioned in my comment @akrun's answer, depending on whether the first column is a grouping variable or not, you may want to modify his approach a little.
For instance, imagine there were a third data.frame
, with a different number of rows than the others. interleave
would not work on that, but the rbindlist
approach would.
df3 <- do.call(rbind, lapply(list(df1, df2), setNames, c("A", "B", "Z")))
rbindlist(list(df1, df2, df3), idcol = TRUE)[, N := sequence(.N), by = .id][order(N)]
# .id k l g N
# 1: 1 A 2004 12.0 1
# 2: 2 A 2012 22.0 1
# 3: 3 A 2004 12.0 1
# 4: 1 B 2004 3.4 2
# 5: 2 B 2012 4.8 2
# 6: 3 B 2004 3.4 2
# 7: 1 C 2004 4.5 3
# 8: 2 C 2012 5.6 3
# 9: 3 C 2004 4.5 3
# 10: 3 A 2012 22.0 4
# 11: 3 B 2012 4.8 5
# 12: 3 C 2012 5.6 6
Pay specific attention to the last three rows in comparison with @akrun's approach.
The equivalent in base R for that last "data.table" approach would be something like:
x <- do.call(rbind, lapply(c("df1", "df2", "df3"), function(x) {
setNames(cbind(rn = x, get(x)), c("id", paste0("V", 1:ncol(get(x)))))
}))
x[order(ave(as.numeric(x$id), x$id, FUN = seq_along)), ]
(So the moral is, use "data.table".)
Upvotes: 3
Reputation: 887501
We can try rbindlist
from data.table
. Place the datasets in a list
, rbind
them with rbindlist
and order
by the first column.
library(data.table)
rbindlist(list(df1, df2))[order(k)]
# k l g
#1: A 2004 12.0
#2: A 2012 22.0
#3: B 2004 3.4
#4: B 2012 4.8
#5: C 2004 4.5
#6: C 2012 5.6
Or using dplyr
library(dplyr)
bind_rows(df1, setNames(df2, names(df1))) %>%
arrange(k)
NOTE: I used df1
and df2
in place of DATA
and DATA2
as object names as it is easier to type.
Upvotes: 6