Reputation: 203
I have this following data.frame
df=data.frame(Abs=c("At1", "At1", "At1", "At2", "At2", "At3", "At4", "At4", "At4", "At4", "At5", "At5", "At6", "At6", "At6"),
vid=c("id1", "id2", "id3", "id4", "id5", "id1", "id1", "id2", "id5", "id6", "id7", "id8", "id8", "id9", "id12"),
ert=c(1,1,2,7,8,4,5,5,5,9,6,5,2,2,2),
Tre= c(1256, 1260, 854, 147,8962, 87452, 1452, 1500, 1502, 147, 210, 258,1400,1850,1865))
> df
Abs vid ert Tre
1 At1 id1 1 1256
2 At1 id2 1 1260
3 At1 id3 2 854
4 At2 id4 7 147
5 At2 id5 8 8962
6 At3 id1 4 87452
7 At4 id1 5 1452
8 At4 id2 5 1500
9 At4 id5 5 1502
10 At4 id6 9 147
11 At5 id7 6 210
12 At5 id8 5 258
13 At6 id8 2 1400
14 At6 id9 2 1850
15 At6 id12 2 1865
I would like to subset some lines based on several conditions comparing different line on the dataframe. The rule to follow is : keep lines that have a same Abs + a different vid + a same ert + an absolute difference in Tre < 300.
Following this rule I would like to obtain this second dataframe
> df2
Abs vid ert Tre
1 At1 id1 1 1256
2 At1 id2 1 1260
3 At4 id1 5 1452
4 At4 id2 5 1500
5 At4 id5 5 1502
6 At6 id9 2 1850
7 At6 id12 2 1865
My issue is to test on lines. Usually I test value for a line between differents variable but don't know how to apply test to compare between lines without using multiple forloop.
Upvotes: 1
Views: 44
Reputation: 66520
Here's a dplyr approach that should work fine for smallish data. If your data is large (my intuition being on the order of >100,000 rows), it will start to make sense to convert to a data.table or sqldf solution. This is a question that can be solved more efficiently with a "non-equi join", which can't currently be done in dplyr, so I'm using a join+filter workaround.
library(dplyr)
df_row <- df %>% mutate(row = row_number())
df_row %>%
left_join(df_row, by = c("Abs", "ert")) %>%
filter(vid.x != vid.y, abs(Tre.x - Tre.y) < 300) %>%
distinct(Abs, vid = vid.x, ert, Tre = Tre.x)
Result
Abs ert vid Tre
1 At1 1 id1 1256
2 At1 1 id2 1260
3 At4 5 id1 1452
4 At4 5 id2 1500
5 At4 5 id5 1502
6 At6 2 id9 1850
7 At6 2 id12 1865
Upvotes: 1