vera_kkk
vera_kkk

Reputation: 47

concatenate data based on a certain sequence

my data look like this way, and variable day ranges from 1 to 232. This is just a shorter version of the data, the real data have over 20000000 rows with variable 'day' ranging from 1 to 232

day time
1   2
1   2
2   2
2   3
3   4
3   5
4   4
4   2

and I have a vector that contains 1000 of randomly selected from sequences of variable day (1-232), say

df=c(3,4,1,2,...,4,1,3)

I want to create a new dataset that sorts based on the sequence. The we first extract day=3 from the data, and then extract day=4 after it, then extracr day=1 and rbind thereafter. For example, the first 4 sequence should look like this way:

day time
3   4
3   5
4   4
4   2
1   2
1   2
2   2
2   3

Upvotes: 1

Views: 139

Answers (2)

r2evans
r2evans

Reputation: 160827

Base R method:

x <- structure(list(day = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L), time = c(2L, 
2L, 2L, 3L, 4L, 5L, 4L, 2L)), class = "data.frame", row.names = c(NA, 
-8L))
df <- c(3,4,1,2,4,1,3)
do.call("rbind.data.frame", lapply(df, function(i) subset(x, day == i)))
#    day time
# 5    3    4
# 6    3    5
# 7    4    4
# 8    4    2
# 1    1    2
# 2    1    2
# 3    2    2
# 4    2    3
# 71   4    4
# 81   4    2
# 11   1    2
# 21   1    2
# 51   3    4
# 61   3    5

The use of do.call("rbind.data.frame", ...) is prone to typical data.frame instantiation, meaning if your real data has any columns of type character, you will likely want to do

do.call("rbind.data.frame", c(lapply(df, function(i) subset(x, day == i)), stringsAsFactors = FALSE))

Also, it could easily be replaced (without the risk of factors) with data.table::rbindlist or dplyr::bind_rows.

Upvotes: 3

Chase
Chase

Reputation: 69231

If I understand correctly, you can do this in a pretty straight forward manner with data.table():

library(data.table)
df <- fread(text = "day time
1   2
1   2
2   2
2   3
3   4
3   5
4   4
4   2", header = TRUE)

seqs <- data.table(day = c(3,4,1,2,4,1,3))

df[seqs, on = "day"]
#>     day time
#>  1:   3    4
#>  2:   3    5
#>  3:   4    4
#>  4:   4    2
#>  5:   1    2
#>  6:   1    2
#>  7:   2    2
#>  8:   2    3
#>  9:   4    4
#> 10:   4    2
#> 11:   1    2
#> 12:   1    2
#> 13:   3    4
#> 14:   3    5

Created on 2019-02-10 by the reprex package (v0.2.1)

Upvotes: 2

Related Questions