Dr. Abrar
Dr. Abrar

Reputation: 337

How to add values in one column that have '0' in another column

I have a dataset with Time.Interval, Net.Chg, and Tick.Count columns. The Net.Chg has positive, negative, and Zero. Based on the Net.Chg, I want to sum the values in Time.Count for positive, negative and Zero and then Group by date.

Time.Interval   Net.Chg    Tick.Count
2-Jan-17         NA        NA
19:15 - 19:16       -0.0047    7
19:16 - 19:17    0     8
19:17 - 19:18    0.0025    10
3-Jan-17         NA        NA
03:45 - 03:46    0     1
03:54 - 03:55   -0.0002    2
19:43 - 19:44   -0.0008    4
20:01 - 20:02    0.0025    2
4-Jan-17         NA        NA
00:54 - 00:55   -0.0007    2
01:10 - 01:11    0.0005    1
01:11 - 01:12    0     1
Time.Interval <- c('2-Jan-17 _00:00:00.000000', '19:15 - 19:16', '19:16 - 19:17', '19:17 - 19:18', '3-Jan-17 _00:00:00.000000', '03:45 - 03:46', '03:54 - 03:55', '19:43 - 19:44', '20:01 - 20:02', '4-Jan-17 _00:00:00.000000', '00:54 - 00:55', '01:10 - 01:11', '01:11 - 01:12')
Net.Chg <- c(NA, -0.0047, 0, 0.0025, NA, 0, -0.0002, -0.0008, 0.0025, NA, -0.0007, 0.0005, 0)
Tick.Count <-  c(NA, 7, 8, 10, NA, 1, 2, 4, 2, NA, 2, 1, 1)
data <- data.frame(Time.Interval, Net.Chg, Tick.Count)

The required output is

pos = sum of "Tick.Count" if Net.Chg > 0
neg = sum of "Tick.Count" if Net.Chg < 0
UnChng = sum of "Tick.Count" if Net.Chg == 0
OF <- pos - Neg

I tried the following code

DF <- dd %>% group_by(grp = cumsum(str_detect(Time.Interval, "[A-Z]"))) %>% summarise(Time.Interval = anydate(first(Time.Interval)), pos = sum((Net.Chg > 0)* Tick.Count, na.rm = T),  neg = sum((Net.Chg < 0) * Tick.Count, na.rm = T), unChg = sum(Net.Chg ==0 * Tick.Count, na.rm=T), OF = sum(sign(Net.Chg) * Tick.Count, na.rm = TRUE))

This code gives me correct values for pos, neg, and 'OF' but the Unchng value is wrong.

The current output is

Time.Interval      pos    Neg     UnChng     OF
02Jan2017          10     7       4           3      
03Jan2017          2      6       5          -4
04Jan2017          1      2       4          -1

while the actual output should be

Time.Interval      pos    Neg     UnChng     OF
02Jan2017          10     7       8           3      
03Jan2017          2      6       1          -4
04Jan2017          1      2       1          -1

I tried the sum(Net.Chg ==0 + Tick.Count, na.rm=T) and length(Net.Chg ==0 * Tick.Count) but could not succeed.

Upvotes: 2

Views: 93

Answers (2)

Rorschach
Rorschach

Reputation: 32426

You should never use == when comparing floats because of precision error. R has functions like all.equal and identical or you can just check for small errors, eg.

DF <- dd %>% 
    group_by(grp = cumsum(str_detect(Time.Interval, "[A-Z]"))) %>% 
    summarise(Time.Interval = anydate(first(Time.Interval)), 
        pos = sum((Net.Chg > 0)* Tick.Count, na.rm = TRUE),  
        neg = sum((Net.Chg < 0) * Tick.Count, na.rm = TRUE), 
        unChg = sum((abs(Net.Chg)-0 < 1e-15) * Tick.Count, na.rm=TRUE), 
        OF = sum(sign(Net.Chg) * Tick.Count, na.rm = TRUE))

It's also considered bad practice to use T instead of TRUE as the former can be set to any value.

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

You need to get the corresponding Tick.Count where Net.Chg ==0 and sum it.

library(anytime)
library(tidyverse)

data %>% 
  group_by(grp = cumsum(str_detect(Time.Interval, "[A-Z]"))) %>% 
  summarise(Time.Interval = anydate(first(Time.Interval)), 
            pos = sum((Net.Chg > 0)* Tick.Count, na.rm = TRUE),  
            neg = sum((Net.Chg < 0) * Tick.Count, na.rm = TRUE), 
            unChg = sum(Tick.Count[Net.Chg ==0], na.rm = TRUE), 
            OF = sum(sign(Net.Chg) * Tick.Count, na.rm = TRUE)) %>%
  ungroup() %>%
  select(-grp)

#  Time.Interval   pos   neg unChg    OF
#  <date>        <dbl> <dbl> <dbl> <dbl>
#1 02Jan2017       10     7     8     3
#2 03Jan2017        2     6     1    -4
#3 04Jan2017        1     2     1    -1

Upvotes: 0

Related Questions