Reputation: 417
I have the following dataframe containing values for angular change in degree, over multiple rows:
'data.frame': 712801 obs. of 4 variables:
$ time_passed: int 1 2 3 4 5 6 7 8 9 10 ...
$ dRoll : num 0.9798 -0.5099 -0.0974 -0.4985 0.1719 ...
$ dPitch : num -0.175 -0.0655 0.0653 0.8907 -1.0893 ...
$ dYaw : num 0.33232 0.06875 -0.00573 0.59588 -0.55577 ...
> myData[1:20,]
time_passed dRoll dPitch dYaw
1 0.97975783 -0.17498131 0.332315521
2 -0.50993244 -0.06548908 0.068754935
3 -0.09740283 0.06531719 -0.005729578
4 -0.49847328 0.89072019 0.595876107
5 0.17188734 -1.08930736 -0.555769061
6 0.68181978 0.36852645 0.492743704
7 1.07143108 0.15206300 -0.635983153
8 -1.43812407 -0.76638835 -0.509932438
9 0.43544792 0.41241502 0.767763445
10 0.25210143 0.61375239 0.509932438
11 0.38961130 0.01203211 -0.360963411
12 0.03437747 -0.29633377 -0.315126787
13 -0.33804510 -0.40639896 -0.177616916
14 0.68181978 0.32446600 0.435447924
15 -1.12872686 -0.37752189 -0.275019742
16 0.75057471 0.33907642 0.464095814
17 -0.25783101 0.11310187 0.309397209
18 -0.01718873 -0.13435860 -0.521391594
19 0.12605071 0.12817066 -0.085943669
20 0.02291831 -0.59856901 -0.120321137
How would I write something like
"If the sum of subsequent negative (or positive) values is smaller than my threshold (say, 5° change), then trow it out of the data set"
in R code?
I would like to apply this criterion to any of the rows, so dRoll
or dPitch
or dYaw
.
In this case, applied based on the dRoll column, the output would be:
time_passed dRoll dPitch dYaw
1 0.97975783 -0.17498131 0.332315521
5 0.17188734 -1.08930736 -0.555769061
6 0.68181978 0.36852645 0.492743704
7 1.07143108 0.15206300 -0.635983153
9 0.43544792 0.41241502 0.767763445
10 0.25210143 0.61375239 0.509932438
11 0.38961130 0.01203211 -0.360963411
12 0.03437747 -0.29633377 -0.315126787
14 0.68181978 0.32446600 0.435447924
16 0.75057471 0.33907642 0.464095814
19 0.12605071 0.12817066 -0.085943669
20 0.02291831 -0.59856901 -0.120321137
All negative runs in dRoll were thrown out, because the sums of subsequent negative values were smaller than 5 degree:
sum(myData[2:4,2])
= -1.105809
-1.43812
, -0.33804
, -1.12872
sum(myData[17:18,2])
= -0.2750197
How would one do that in R?
Upvotes: 3
Views: 584
Reputation: 83215
My advise would be to melt your dataframe into long format first. After that you can do grouped operations much easier.
Using the data.table
package (which we need for the melt
and rleid
functions):
# load the package
library(data.table)
# melt into long format
DT2 <- melt(DT, id = 'time_passed')
# create a cummulative sum for each run
# 'rleid(value > 0)' creates a grouping variable for runs of consecutive positive/negative values
# by adding '[.N]' to 'cumsum(value)' you set all values in 'csum' to the highest value
# for each run, which we can use to filter the data
DT2[, csum := cumsum(value)[.N], by = .(variable, rleid(value > 0))]
# filter the data according to a rule
# in this case only the values between -1.2 and -0.2 are filtered out
DT2[csum < -1.2 | csum > -0.2]
which gives (a snapshot of the result):
time_passed variable value csum
1: 1 dRoll 0.979757830 0.979757830
2: 5 dRoll 0.171887340 1.925138200
3: 6 dRoll 0.681819780 1.925138200
4: 7 dRoll 1.071431080 1.925138200
5: 8 dRoll -1.438124070 -1.438124070
6: 9 dRoll 0.435447920 1.111538120
....
....
14: 3 dPitch 0.065317190 0.956037380
15: 4 dPitch 0.890720190 0.956037380
16: 6 dPitch 0.368526450 0.520589450
17: 7 dPitch 0.152063000 0.520589450
18: 9 dPitch 0.412415020 1.038199520
19: 10 dPitch 0.613752390 1.038199520
....
....
26: 1 dYaw 0.332315521 0.401070456
27: 2 dYaw 0.068754935 0.401070456
28: 3 dYaw -0.005729578 -0.005729578
29: 4 dYaw 0.595876107 0.595876107
30: 6 dYaw 0.492743704 0.492743704
31: 9 dYaw 0.767763445 1.277695883
Upvotes: 4