Reputation: 21
I have a data frame of 10 different blocks and I'd like to only take the first 1500 rows of each block. Block is identified by the ID column below. This is a condensed version to illustrate.
> mdf[1:15,1:4]
x.1 x.2 x.3 ID
1: 0.025061 0.0010093 0.0087476 1
2: 0.025044 0.0010191 0.0087674 1
3: 0.025023 0.0010280 0.0087716 2
4: 0.025000 0.0010360 0.0087602 2
5: 0.024974 0.0010433 0.0087334 2
6: 0.024944 0.0010497 0.0086911 2
7: 0.024912 0.0010553 0.0086335 2
8: 0.024877 0.0010602 0.0085607 3
9: 0.024839 0.0010643 0.0084728 3
10: 0.024798 0.0010677 0.0083699 3
11: 0.024753 0.0010703 0.0082521 3
12: 0.024706 0.0010723 0.0081197 3
13: 0.024656 0.0010735 0.0079726 4
14: 0.024603 0.0010740 0.0078112 4
15: 0.024546 0.0010739 0.0076356 4
What I'd like is to be able to identify the row at which the ID changes and then keep the next 1500 rows, discarding everything after until the next ID change.
I've tried subsetting by hand using indexing and a for loop to find where the value changes but I haven't had any luck.Any help here would be greatly appreciated!!
Upvotes: 2
Views: 69
Reputation: 2764
Please find two solutions for getting your desired output.
setDT(df)[, head(.SD, 2), by = ID]
OR
Much efficient way for large dataset by avoiding .SD for each group.
setDT(df)[, indx := seq_len(.N), by = ID][indx <= 2][,!("indx"),with=F]
Upvotes: 1