Reputation: 47
my data look like this way, and variable day ranges from 1 to 232. This is just a shorter version of the data, the real data have over 20000000 rows with variable 'day' ranging from 1 to 232
day time
1 2
1 2
2 2
2 3
3 4
3 5
4 4
4 2
and I have a vector that contains 1000 of randomly selected from sequences of variable day (1-232), say
df=c(3,4,1,2,...,4,1,3)
I want to create a new dataset that sorts based on the sequence. The we first extract day=3 from the data, and then extract day=4 after it, then extracr day=1 and rbind thereafter. For example, the first 4 sequence should look like this way:
day time
3 4
3 5
4 4
4 2
1 2
1 2
2 2
2 3
Upvotes: 1
Views: 139
Reputation: 160827
Base R method:
x <- structure(list(day = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L), time = c(2L,
2L, 2L, 3L, 4L, 5L, 4L, 2L)), class = "data.frame", row.names = c(NA,
-8L))
df <- c(3,4,1,2,4,1,3)
do.call("rbind.data.frame", lapply(df, function(i) subset(x, day == i)))
# day time
# 5 3 4
# 6 3 5
# 7 4 4
# 8 4 2
# 1 1 2
# 2 1 2
# 3 2 2
# 4 2 3
# 71 4 4
# 81 4 2
# 11 1 2
# 21 1 2
# 51 3 4
# 61 3 5
The use of do.call("rbind.data.frame", ...)
is prone to typical data.frame
instantiation, meaning if your real data has any columns of type character
, you will likely want to do
do.call("rbind.data.frame", c(lapply(df, function(i) subset(x, day == i)), stringsAsFactors = FALSE))
Also, it could easily be replaced (without the risk of factor
s) with data.table::rbindlist
or dplyr::bind_rows
.
Upvotes: 3
Reputation: 69231
If I understand correctly, you can do this in a pretty straight forward manner with data.table()
:
library(data.table)
df <- fread(text = "day time
1 2
1 2
2 2
2 3
3 4
3 5
4 4
4 2", header = TRUE)
seqs <- data.table(day = c(3,4,1,2,4,1,3))
df[seqs, on = "day"]
#> day time
#> 1: 3 4
#> 2: 3 5
#> 3: 4 4
#> 4: 4 2
#> 5: 1 2
#> 6: 1 2
#> 7: 2 2
#> 8: 2 3
#> 9: 4 4
#> 10: 4 2
#> 11: 1 2
#> 12: 1 2
#> 13: 3 4
#> 14: 3 5
Created on 2019-02-10 by the reprex package (v0.2.1)
Upvotes: 2