Reputation: 13
Lets say I have a data frame as follows:
gageID date flow_cms
1011000 1937-02-19 25.768334
1011000 1937-02-20 24.918828
1011000 1937-02-21 24.069322
I want to aggregate the rows that have the same month summing the flow, and store the result into a new data value: the first day of every month; in order to obtain the following output:
gageID date sum_monthly_flow
1011000 1937-02-01 500.2222
1011000 1937-03-01 589.222
I'm using this line:
>rowsum(qfile$flow_cms, format(qfile$date, '%Y-%m-01'))
and I obtain the right sum, but I want also to reduce the record days in a unique day: the first of every month! with the strip shows above, R cannot recognize the left coloumn as data (or date).
Help would be very much appreciated!
Upvotes: 1
Views: 1971
Reputation: 42544
Using data.table
and lubridate
you can try:
library(data.table)
setDT(qfile)[, lapply(.SD, sum), by = .(gageID, date = lubridate::floor_date(date, "month"))]
gageID date flow_cms 1: 1011000 1937-02-01 74.75648 2: 1011000 1937-03-01 601.44400
Note that it is assumed that date
is already of class Date
and that gageID
is another grouping parameter.
Alternatively, the call to a function from another package can be avoided using data.table
's own mday()
function (thanks to @Henrik):
setDT(qfile)[, lapply(.SD, sum), by = .(gageID, date = date - mday(date) + 1)]
Here, Abdou's sample data is used:
qfile <- structure(list(gageID = c(1011000L, 1011000L, 1011000L, 1011000L,
1011000L), date = structure(c(-12004, -12003, -12002, -11975,
-11994), class = "Date"), flow_cms = c(25.76833, 24.91883, 24.06932,
12.222, 589.222)), .Names = c("gageID", "date", "flow_cms"), row.names = c(NA,
-5L), class = "data.frame")
Upvotes: 0
Reputation: 13274
Another solution:
df
gageID date flow_cms
1 1011000 1937-02-19 25.76833
2 1011000 1937-02-20 24.91883
3 1011000 1937-02-21 24.06932
4 1011000 1937-03-20 12.22200
5 1011000 1937-03-01 589.22200
df1 = aggregate(list(flow_cms=df$flow_cms),by=list(dates=cut(as.POSIXct(df$date),"month")),sum)
df1
dates flow_cms
1 1937-02-01 74.75648
2 1937-03-01 601.44400
Upvotes: 2
Reputation: 24945
First make sure your "date" column is properly formatted as a date object in R:
qfile$date <- as.Date(qfile$date, format = "%Y-%m-%d")
Then we can use format
to extract the month and year, and group_by
that for a sum and take the first date:
library(dplyr)
qfile %>% mutate(monthyear = as.character(format(date, "%m-%Y"))) %>%
arrange(date) %>%
group_by(monthyear) %>%
summarise(date=date[1], flow = sum(flow_cms))
This will give you the first taken record for each month in the data.
Upvotes: 1