Katy
Katy

Reputation: 41

Is there a function in R that will sum values based on Date of Year?

I have a data table (Precip15) consisting of columns of precipitation, date of year (DOY), and Date_Time in POSIXct format. I need to be able to see the total precipitation (Rain_cm) for every day recorded. Any suggestions?

An example of the data table format looks like this:

DOY     Rain     Rain_cm    Date_Time
179      6         0.6      2019-06-28 15:00:00
179      0         NA       2019-06-28 15:15:00
179      2         0.2      2019-06-28 16:45:00
180      0         NA       2019-06-29 10:00:00
180      10.2      1.2      2019-06-29 10:15:00
180      2         0.2      2019-06-29 13:00:00

I need it to look like this:

DOY     Rain_cm    
179     0.8      
180     1.4

or possibly:

Date           Rain_cm
2019-06-28       0.8 
2019-06-29       1.4

Thanks in advance for any help!

Upvotes: 2

Views: 2746

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269526

Here are some base R solutions using the data frame DF defined reproducibly in the Note at the end. Solutions based on dplyr, data.table or zoo packages would be possible as well.

1) aggregate aggregate on DOY or on Date (defined in the transform statement below) depending on what you want. Note that aggregate automatically removes rows with NAs.

aggregate(Rain_cm ~ DOY, DF, sum)
##   DOY Rain_cm
## 1 179     0.8
## 2 180     1.4

DF2 <- transform(DF, Date = as.Date(Date_Time))
aggregate(Rain_cm ~ Date, DF2, sum)
##         Date Rain_cm
## 1 2019-06-28     0.8
## 2 2019-06-29     1.4

2) rowsum Another base R solution is rowsum returning a one column matrix with the row names being the value of the grouping variable. DF2 is from (1).

with(na.omit(DF), rowsum(Rain_cm, DOY))
##     [,1]
## 179  0.8
## 180  1.4

with(na.omit(DF2), rowsum(Rain_cm, Date))
##            [,1]
## 2019-06-28  0.8
## 2019-06-29  1.4

3) tapply Another base R approach is tapply. This produces a named numeric vector. DF2 is from (1).

with(DF, tapply(Rain_cm, DOY, sum, na.rm = TRUE))
## 179 180 
## 0.8 1.4 

with(DF2, tapply(Rain_cm, Date, sum, na.rm = TRUE))
## 2019-06-28 2019-06-29 
##        0.8        1.4

4) xtabs xtabs can be used to form an xtabs table object. DF2 is from (1).

xtabs(Rain_cm ~ DOY, DF)
## DOY
## 179 180 
## 0.8 1.4 

xtabs(Rain_cm ~ Date, DF2)
## Date
## 2019-06-28 2019-06-29 
##        0.8        1.4 

Note

The data in reproducible form is assumed to be:

Lines <- "DOY     Rain     Rain_cm    Date_Time
179      6         0.6      2019-06-28 15:00:00
179      0         NA       2019-06-28 15:15:00
179      2         0.2      2019-06-28 16:45:00
180      0         NA       2019-06-29 10:00:00
180      10.2      1.2      2019-06-29 10:15:00
180      2         0.2      2019-06-29 13:00:00"
L <- readLines(textConnection(Lines))
DF <- read.csv(text = gsub("  +", ",", Lines))

Upvotes: 3

Matthew
Matthew

Reputation: 38

You can use the aggregate and cut functions to calculate your total daily precip values. The following code will provide you with the desired results:

precipTotals <- aggreate(df$Rain_cm ~ cut(df$Date_Time, breaks = "day"), x = df, 
    FUN = sum, na.rm = TRUE)

Make sure your precip columns are as.numeric() and your Date_Time is in as.POSIXct() format and this will work for you.

Upvotes: 0

Georgery
Georgery

Reputation: 8117

df <- tribble(
    ~DOY,     ~Rain,     ~Rain_cm,    ~Date_Time
    , 179      , 6         , 0.6      , "2019-06-28 15:00:00"
    , 179      , 0         , NA       , "2019-06-28 15:15:00"
    , 179      , 2         , 0.2      , "2019-06-28 16:45:00"
    , 180      , 0         , NA       , "2019-06-29 10:00:00"
    , 180      , 10.2      , 1.2      , "2019-06-29 10:15:00"
    , 180      , 2         , 0.2      , "2019-06-29 13:00:00"
)

df %>%
    mutate(Date_Time = ymd_hms(Date_Time)) %>%
    mutate(Date = as.Date(Date_Time)) %>%
    group_by(Date) %>%
    summarise(perDate = sum(Rain_cm, na.rm = TRUE))

  Date       perDate
  <date>       <dbl>
1 2019-06-28     0.8
2 2019-06-29     1.4

Upvotes: 0

Related Questions