Reputation: 1305
I am wondering how I can perform this formula in R. It is counting over and equal to 400 and also counting under and equal to 400.
The formula is below for excel:
=SUM(COUNTIFS(B$2:$B$1048576,{"<=-400",">=400"},A$2:$A$1048576,"="&C2))
B2 column: Data A2 Column: Date C2 Column Matching Date
Here is an example date frame:
A (time series date) B (data) C (Numerical Date)
20140219 -412 20140219
20140219 -273.6666667 20140220
20140219 -228 20140221
20140219 -151.3333333 20140224
20140219 -157 20140225
20140219 -177.3333333 20140226
20140219 81.66666667 20140227
20140219 40 20140228
20140219 93.33333333 20140303
20140219 201.6666667 20140304
20140219 203.6666667 20140305
20140219 69 20140306
20140219 -22.66666667 20140307
20140219 49.33333333 20140310
20140219 40.33333333 20140311
20140219 194 20140312
20140219 74 20140313
20140219 333.3333333 20140314
20140219 283.6666667 20140317
20140219 176.6666667 20140318
20140219 219 20140319
20140219 276.6666667 20140320
20140219 107 20140321
20140219 138.3333333 20140324
As you can see... my time series date is in column A and this extends on a 1 minute basis all the way to present day. The data is in column b and column C is a helper column to aid pulling the data for that particular date, on matching date in that column... it then searches the data column for values over and under 400.
The goal here is to for each particular date, to count the numbers <=-400",">=400 for that given date. After its done one day, it then moves onto the next and populates the data frame with the results.
An example out put is below:
date Count 400
20140819 25
20140820 50
20140821 46
20140822 48
20140825 43
20140826 17
20140827 18
20140828 18
20140829 44
20140902 57
In the sample data frame, the result would be total count 24 for date 20140219.
Upvotes: 0
Views: 102
Reputation: 13581
Using dplyr
. Starting with example data:
set.seed(1)
df <- data.frame(Date=c(rep("20170101",5), rep("20170102",5)),
Value=((runif(10)*1600)-800),
stringsAsFactors=F)
head(df)
Date Value
1 20170101 -375.1861
2 20170101 -204.6018
3 20170101 116.5654
4 20170101 653.1325
5 20170101 -477.3089
Use abs(Value) >= 400
to find values that are <=-400
or >=400
, then group_by
the Date
, and count n()
.
library(dplyr)
df %>%
filter(abs(Value) >= 400) %>%
group_by(Date) %>%
summarise(Count=n())
Output:
Date Count
1 20170101 2
2 20170102 3
Upvotes: 2
Reputation: 3194
Using base R I would do the following:
Chi's Fake Data (doing the heaven's work thank you):
df <- data.frame(A = c(rep("20170101", 5), rep("20170102", 5)),
B = ((runif(10) * 1600) - 800), stringsAsFactors = F)
df$D <- ifelse(abs(df$Value) >= 400, T, F) # Boolean column
sapply(split(df$D, df$Date), sum) # split and sum
# 20170101 20170102
# 2 3
Or simply run the following function on your data:
# much larger data (2206800 rows)
df <- data.frame(Date = as.character(rep(20170101:20170713, each = 60*60)), Value =((runif(2206800)*1600)-800), stringsAsFactors=F)
sumcountif <- function(value, date, abs.cutoff) {
m <- lapply(split(ifelse(abs(value) >= abs.cutoff, T, F), date), sum)
d <- cbind.data.frame(date = names(m), unlist(m), stringsAsFactors = F)
rownames(d) <- NULL
colnames(d)[2] <- paste0("count_",substitute(abs.cutoff))
d
}
sumcountif(df$Value, df$Date, 400) # inputs here
Upvotes: 0