fourdegreeswester
fourdegreeswester

Reputation: 13

Sum by date using "if" statement

I am trying to sum between two dates in a data frame using an "if" statement.

date = seq(as.Date("2000-01-01"), as.Date("2000-01-31"), by="days")
nums = seq(1, 1, length.out = 31)
df = data.frame(date, nums)

if(df$date >= as.Date("2000-01-01") && df$date <= as.Date("2000-01-07")){
  sum(df$nums)
}

However, the output is "31" rather than "7" as I would expect. Is there a better way to sum by date? I would like to use an "if" statement because I would like to apply this to a much larger dataset with many different columns and within different lengths of time.

Upvotes: 0

Views: 73

Answers (5)

Len Greski
Len Greski

Reputation: 10875

...and illustrating the diversity of R, here's a solution using sqldf.

date = seq(as.Date("2000-01-01"), as.Date("2000-01-31"), by="days")
nums = seq(1, 1, length.out = 31)
df = data.frame(date, nums)

startDate <- as.Date("2000-01-01")
endDate <- as.Date("2000-01-07")
library(sqldf)
fn$sqldf("select sum(nums) from df where date between $startDate and $endDate")

and the output:

> fn$sqldf("select sum(nums) from df where date between $startDate and $endDate")
   sum(nums)
1         7
> 

Upvotes: 1

IRTFM
IRTFM

Reputation: 263481

The if-function in R is not vectorized and neither is the "&&"-operator. The usual way to employ logical subsetting is to the vectorized operator "&" and put it in the first argument to "[":

sum(df[ df$date >= as.Date("2000-01-01") & df$date <= as.Date("2000-01-07"), 
   #That is a logical vector in the row selection position.
    "nums"])  #   The second argument to "[" is/are the column(s) to be selected.
#[1] 7

Upvotes: 1

Cybernetic
Cybernetic

Reputation: 13354

Just use this function:

sum_by_dates <- function(frame, date_column, num_column, date1, date2) {
  sub_vec <- frame[[date_column]][frame[[date_column]] >= as.Date(date1) & frame[[date_column]] <= as.Date(date2)]
  df_new <- subset(frame, frame[[date_column]] %in% sub_vec)
  tot <- sum(df_new[[num_column]])
  return(tot)
}

Usage:

sum_by_dates(df, 'date', 'nums', '2000-01-01', '2000-01-07')

Upvotes: 1

AlphaDrivers
AlphaDrivers

Reputation: 136

library(dplyr)

# your data
date = seq(as.Date("2000-01-01"), as.Date("2000-01-31"), by="days")
nums = seq(1, 1, length.out = 31)
df = data.frame(date, nums)

# answer
df %>%
  filter(date >= '2000-01-01' & date <= '2000-01-07') %>%
  summarize(sum = sum(nums))

Upvotes: 1

akrun
akrun

Reputation: 887881

We can just do the sum on the logical vector. Note that we are using only a single & to return a logical vector.

sum(df$date >= as.Date("2000-01-01") & df$date <= as.Date("2000-01-07"))

If the value of 'nums' are not all 1s, then subset the 'nums' based on the logical vector and get the sum`

sum(df$nums[df$date >= as.Date("2000-01-01") & df$date <= as.Date("2000-01-07")])

Upvotes: 2

Related Questions