Reputation: 301
I would like to know if there is a way of subsetting a huge R dataframe [df] so that only certain sequences remain for each group [device].
I have a dataframe [df] like this:
id device date pressure
1 B3 2020-04-15 08:00 112
2 B3 2020-04-15 09:00 100
3 B3 2020-04-15 10:00 89
4 B3 2020-04-15 11:00 90
5 B3 2020-04-15 12:00 60
6 B3 2020-04-15 13:00 28
7 B3 2020-04-16 09:00 120
8 B3 2020-04-16 10:00 80
9 B3 2020-04-16 11:00 73
10 B3 2020-04-16 12:00 61
11 B3 2020-04-16 13:00 30
I would like to get only the rows where the pressure drops from 120 down to 60 [or first value lower than 60].
The expected result would be as follows:
id device date pressure group
1 B3 2020-04-15 08:00 112 1
2 B3 2020-04-15 09:00 100 1
3 B3 2020-04-15 10:00 89 1
4 B3 2020-04-15 11:00 90 1
5 B3 2020-04-15 12:00 60 1
7 B3 2020-04-16 09:00 120 2
8 B3 2020-04-16 10:00 80 2
9 B3 2020-04-16 11:00 73 2
10 B3 2020-04-16 12:00 61 2
11 B3 2020-04-16 13:00 30 2
Would this be possible? Thank you for any suggestions.
Upvotes: 0
Views: 39
Reputation: 2894
If you want to do it without dplyr
and pipes, you can loop through the pressures to annotate the groups:
d$group=NA
d$group[1]=1
for(i in 2:dim(d)[1]){
if(d$pressure[i]>60 & d$pressure[i-1] < 60){
d$group[i]=d$group[i-1]+1
} else if (d$pressure[i]>d$pressure[i-1] & d$pressure[i]<60){
d$group[i]=d$group[i-1]+1
} else{
d$group[i]=d$group[i-1]
}
}
In such an if-elise if block, you can add as many different conditions as you want (e.g. changing devices, changing dates,...)
Upvotes: 0
Reputation: 389012
You can create a new group when the current value is greater than 60 and the previous value was less than 60 and select only the rows till we encounter first row less than equal to 60.
library(dplyr)
df %>%
group_by(device,
group = cumsum(pressure > 60 & lag(pressure, default = 0) < 60)) %>%
slice(seq_len(which.max(pressure <= 60)))
# id device date pressure group
# <int> <chr> <chr> <int> <int>
# 1 1 B3 2020-04-1508:00 112 1
# 2 2 B3 2020-04-1509:00 100 1
# 3 3 B3 2020-04-1510:00 89 1
# 4 4 B3 2020-04-1511:00 90 1
# 5 5 B3 2020-04-1512:00 60 1
# 6 7 B3 2020-04-1609:00 120 2
# 7 8 B3 2020-04-1610:00 80 2
# 8 9 B3 2020-04-1611:00 73 2
# 9 10 B3 2020-04-1612:00 61 2
#10 11 B3 2020-04-1613:00 30 2
Upvotes: 1