Reputation: 337
I have a dataset with Time.Interval, Net.Chg, and Tick.Count columns. The Net.Chg has positive, negative, and Zero. Based on the Net.Chg, I want to sum the values in Time.Count for positive, negative and Zero and then Group by date.
Time.Interval Net.Chg Tick.Count
2-Jan-17 NA NA
19:15 - 19:16 -0.0047 7
19:16 - 19:17 0 8
19:17 - 19:18 0.0025 10
3-Jan-17 NA NA
03:45 - 03:46 0 1
03:54 - 03:55 -0.0002 2
19:43 - 19:44 -0.0008 4
20:01 - 20:02 0.0025 2
4-Jan-17 NA NA
00:54 - 00:55 -0.0007 2
01:10 - 01:11 0.0005 1
01:11 - 01:12 0 1
Time.Interval <- c('2-Jan-17 _00:00:00.000000', '19:15 - 19:16', '19:16 - 19:17', '19:17 - 19:18', '3-Jan-17 _00:00:00.000000', '03:45 - 03:46', '03:54 - 03:55', '19:43 - 19:44', '20:01 - 20:02', '4-Jan-17 _00:00:00.000000', '00:54 - 00:55', '01:10 - 01:11', '01:11 - 01:12')
Net.Chg <- c(NA, -0.0047, 0, 0.0025, NA, 0, -0.0002, -0.0008, 0.0025, NA, -0.0007, 0.0005, 0)
Tick.Count <- c(NA, 7, 8, 10, NA, 1, 2, 4, 2, NA, 2, 1, 1)
data <- data.frame(Time.Interval, Net.Chg, Tick.Count)
The required output is
pos = sum of "Tick.Count" if Net.Chg > 0
neg = sum of "Tick.Count" if Net.Chg < 0
UnChng = sum of "Tick.Count" if Net.Chg == 0
OF <- pos - Neg
I tried the following code
DF <- dd %>% group_by(grp = cumsum(str_detect(Time.Interval, "[A-Z]"))) %>% summarise(Time.Interval = anydate(first(Time.Interval)), pos = sum((Net.Chg > 0)* Tick.Count, na.rm = T), neg = sum((Net.Chg < 0) * Tick.Count, na.rm = T), unChg = sum(Net.Chg ==0 * Tick.Count, na.rm=T), OF = sum(sign(Net.Chg) * Tick.Count, na.rm = TRUE))
This code gives me correct values for pos
, neg
, and 'OF' but the Unchng
value is wrong.
The current output is
Time.Interval pos Neg UnChng OF
02Jan2017 10 7 4 3
03Jan2017 2 6 5 -4
04Jan2017 1 2 4 -1
while the actual output should be
Time.Interval pos Neg UnChng OF
02Jan2017 10 7 8 3
03Jan2017 2 6 1 -4
04Jan2017 1 2 1 -1
I tried the sum(Net.Chg ==0 + Tick.Count, na.rm=T)
and length(Net.Chg ==0 * Tick.Count)
but could not succeed.
Upvotes: 2
Views: 93
Reputation: 32426
You should never use ==
when comparing floats because of precision error. R has functions like all.equal
and identical
or you can just check for small errors, eg.
DF <- dd %>%
group_by(grp = cumsum(str_detect(Time.Interval, "[A-Z]"))) %>%
summarise(Time.Interval = anydate(first(Time.Interval)),
pos = sum((Net.Chg > 0)* Tick.Count, na.rm = TRUE),
neg = sum((Net.Chg < 0) * Tick.Count, na.rm = TRUE),
unChg = sum((abs(Net.Chg)-0 < 1e-15) * Tick.Count, na.rm=TRUE),
OF = sum(sign(Net.Chg) * Tick.Count, na.rm = TRUE))
It's also considered bad practice to use T
instead of TRUE
as the former can be set to any value.
Upvotes: 2
Reputation: 388982
You need to get the corresponding Tick.Count
where Net.Chg ==0
and sum
it.
library(anytime)
library(tidyverse)
data %>%
group_by(grp = cumsum(str_detect(Time.Interval, "[A-Z]"))) %>%
summarise(Time.Interval = anydate(first(Time.Interval)),
pos = sum((Net.Chg > 0)* Tick.Count, na.rm = TRUE),
neg = sum((Net.Chg < 0) * Tick.Count, na.rm = TRUE),
unChg = sum(Tick.Count[Net.Chg ==0], na.rm = TRUE),
OF = sum(sign(Net.Chg) * Tick.Count, na.rm = TRUE)) %>%
ungroup() %>%
select(-grp)
# Time.Interval pos neg unChg OF
# <date> <dbl> <dbl> <dbl> <dbl>
#1 02Jan2017 10 7 8 3
#2 03Jan2017 2 6 1 -4
#3 04Jan2017 1 2 1 -1
Upvotes: 0