concatenate data based on a certain sequence

Question

my data look like this way, and variable day ranges from 1 to 232. This is just a shorter version of the data, the real data have over 20000000 rows with variable 'day' ranging from 1 to 232

and I have a vector that contains 1000 of randomly selected from sequences of variable day (1-232), say

df=c(3,4,1,2,...,4,1,3)

I want to create a new dataset that sorts based on the sequence. The we first extract day=3 from the data, and then extract day=4 after it, then extracr day=1 and rbind thereafter. For example, the first 4 sequence should look like this way:

r2evans · Accepted Answer

Base R method:

x <- structure(list(day = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L), time = c(2L, 
2L, 2L, 3L, 4L, 5L, 4L, 2L)), class = "data.frame", row.names = c(NA, 
-8L))
df <- c(3,4,1,2,4,1,3)
do.call("rbind.data.frame", lapply(df, function(i) subset(x, day == i)))
#    day time
# 5    3    4
# 6    3    5
# 7    4    4
# 8    4    2
# 1    1    2
# 2    1    2
# 3    2    2
# 4    2    3
# 71   4    4
# 81   4    2
# 11   1    2
# 21   1    2
# 51   3    4
# 61   3    5

The use of do.call("rbind.data.frame", ...) is prone to typical data.frame instantiation, meaning if your real data has any columns of type character, you will likely want to do

do.call("rbind.data.frame", c(lapply(df, function(i) subset(x, day == i)), stringsAsFactors = FALSE))

Also, it could easily be replaced (without the risk of factors) with data.table::rbindlist or dplyr::bind_rows.

concatenate data based on a certain sequence

Answers (2)

Related Questions