Reputation: 13
I am trying to sum between two dates in a data frame using an "if" statement.
date = seq(as.Date("2000-01-01"), as.Date("2000-01-31"), by="days")
nums = seq(1, 1, length.out = 31)
df = data.frame(date, nums)
if(df$date >= as.Date("2000-01-01") && df$date <= as.Date("2000-01-07")){
sum(df$nums)
}
However, the output is "31" rather than "7" as I would expect. Is there a better way to sum by date? I would like to use an "if" statement because I would like to apply this to a much larger dataset with many different columns and within different lengths of time.
Upvotes: 0
Views: 73
Reputation: 10875
...and illustrating the diversity of R, here's a solution using sqldf
.
date = seq(as.Date("2000-01-01"), as.Date("2000-01-31"), by="days")
nums = seq(1, 1, length.out = 31)
df = data.frame(date, nums)
startDate <- as.Date("2000-01-01")
endDate <- as.Date("2000-01-07")
library(sqldf)
fn$sqldf("select sum(nums) from df where date between $startDate and $endDate")
and the output:
> fn$sqldf("select sum(nums) from df where date between $startDate and $endDate")
sum(nums)
1 7
>
Upvotes: 1
Reputation: 263481
The if
-function in R is not vectorized and neither is the "&&"-operator. The usual way to employ logical subsetting is to the vectorized operator "&" and put it in the first argument to "[":
sum(df[ df$date >= as.Date("2000-01-01") & df$date <= as.Date("2000-01-07"),
#That is a logical vector in the row selection position.
"nums"]) # The second argument to "[" is/are the column(s) to be selected.
#[1] 7
Upvotes: 1
Reputation: 13354
Just use this function:
sum_by_dates <- function(frame, date_column, num_column, date1, date2) {
sub_vec <- frame[[date_column]][frame[[date_column]] >= as.Date(date1) & frame[[date_column]] <= as.Date(date2)]
df_new <- subset(frame, frame[[date_column]] %in% sub_vec)
tot <- sum(df_new[[num_column]])
return(tot)
}
Usage:
sum_by_dates(df, 'date', 'nums', '2000-01-01', '2000-01-07')
Upvotes: 1
Reputation: 136
library(dplyr)
# your data
date = seq(as.Date("2000-01-01"), as.Date("2000-01-31"), by="days")
nums = seq(1, 1, length.out = 31)
df = data.frame(date, nums)
# answer
df %>%
filter(date >= '2000-01-01' & date <= '2000-01-07') %>%
summarize(sum = sum(nums))
Upvotes: 1
Reputation: 887881
We can just do the sum
on the logical vector. Note that we are using only a single &
to return a logical vector.
sum(df$date >= as.Date("2000-01-01") & df$date <= as.Date("2000-01-07"))
If the value of 'nums' are not all 1s, then subset the 'nums' based on the logical vector and get the sum`
sum(df$nums[df$date >= as.Date("2000-01-01") & df$date <= as.Date("2000-01-07")])
Upvotes: 2