A.M.G16
A.M.G16

Reputation: 91

Merge two data frames to get alternate rows of each data frame in sequence

My data.frame DATA is

  k    l   g
1 A 2004  12
2 B 2004 3.4
3 C 2004 4.5

Another data.frame DATA2 is

  i    d   t
1 A 2012  22
2 B 2012 4.8
3 C 2012 5.6

I want to get

1 A 2004  12
1 A 2012  22
2 B 2004 3.4
2 B 2012 4.8
3 C 2004 4.5
3 C 2012 5.6

Upvotes: 8

Views: 4018

Answers (4)

leogama
leogama

Reputation: 1059

Solution using dplyr without the need for a key column and without modifying the original order (if the data frames weren't sorted by an "id" column).

library(dplyr)

stopifnot((N <- nrow(DATA)) == nrow(DATA2))

INTER_DATA <- setNames(DATA2, names(DATA)) %>%
    bind_rows(DATA, .) %>%
    arrange(rep(seq_len(N), length = n()))

Result:

  k    l    g
1 A 2004 12.0
2 A 2012 22.0
3 B 2004  3.4
4 B 2012  4.8
5 C 2004  4.5
6 C 2012  5.6

Upvotes: 0

h3rm4n
h3rm4n

Reputation: 4187

You can also do this in base R with rbind without the need to use extra packages, but you will have to set the columnnames of df2 the same as the columnnames in df1:

colnames(df2) <- colnames(df1) # or: setNames(df2, colnames(df1))
new.df <- rbind(df1,df2)
new.df <- new.df[order(new.df$k),]

This will result in the following dataframe:

> new.df
   k    l    g
1  A 2004 12.0
11 A 2012 22.0
2  B 2004  3.4
21 B 2012  4.8
3  C 2004  4.5
31 C 2012  5.6

Upvotes: 3

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193637

You can try the interleave function from the "gdata" package. However, this would require your inputs have the same column names and have the same number of rows.

The approach would be:

library(gdata)      # for interleave
do.call(interleave, lapply(list(df1, df2), setNames, paste0("V", 1:ncol(df1))))
#    V1   V2   V3
# 1   A 2004 12.0
# 11  A 2012 22.0
# 2   B 2004  3.4
# 21  B 2012  4.8
# 3   C 2004  4.5
# 31  C 2012  5.6

Alternatively, as mentioned in my comment @akrun's answer, depending on whether the first column is a grouping variable or not, you may want to modify his approach a little.

For instance, imagine there were a third data.frame, with a different number of rows than the others. interleave would not work on that, but the rbindlist approach would.

df3 <- do.call(rbind, lapply(list(df1, df2), setNames, c("A", "B", "Z")))

rbindlist(list(df1, df2, df3), idcol = TRUE)[, N := sequence(.N), by = .id][order(N)]
#     .id k    l    g N
#  1:   1 A 2004 12.0 1
#  2:   2 A 2012 22.0 1
#  3:   3 A 2004 12.0 1
#  4:   1 B 2004  3.4 2
#  5:   2 B 2012  4.8 2
#  6:   3 B 2004  3.4 2
#  7:   1 C 2004  4.5 3
#  8:   2 C 2012  5.6 3
#  9:   3 C 2004  4.5 3
# 10:   3 A 2012 22.0 4
# 11:   3 B 2012  4.8 5
# 12:   3 C 2012  5.6 6

Pay specific attention to the last three rows in comparison with @akrun's approach.


The equivalent in base R for that last "data.table" approach would be something like:

x <- do.call(rbind, lapply(c("df1", "df2", "df3"), function(x) {
  setNames(cbind(rn = x, get(x)), c("id", paste0("V", 1:ncol(get(x)))))
}))
x[order(ave(as.numeric(x$id), x$id, FUN = seq_along)), ]

(So the moral is, use "data.table".)

Upvotes: 3

akrun
akrun

Reputation: 887501

We can try rbindlist from data.table. Place the datasets in a list, rbind them with rbindlist and order by the first column.

library(data.table)
rbindlist(list(df1, df2))[order(k)]
#   k    l    g
#1: A 2004 12.0
#2: A 2012 22.0
#3: B 2004  3.4
#4: B 2012  4.8
#5: C 2004  4.5
#6: C 2012  5.6

Or using dplyr

library(dplyr)
bind_rows(df1, setNames(df2, names(df1))) %>% 
           arrange(k)

NOTE: I used df1 and df2 in place of DATA and DATA2 as object names as it is easier to type.

Upvotes: 6

Related Questions