Reputation: 41
I have daily data and I'd like to count the number of times per year the daily measurement is within a certain range. The data is also grouped by factor, so I need to know number of days per year for each factor that is within a certain range (e.g. 15 to 18)
I have a large dataset of over 100 years but here is some data just for this example:
Date <- seq(as.Date("2010/01/01"), by = "day", length.out = 1095)
People <- sample.int(20, 1095, replace = TRUE)
Country <- sample(x = c("Australia", "Canada", "France"), size = 1095, replace = TRUE)
mydf <- data.frame(Date, People, Country)
I would like to know the number of times per year the value of "People" is between 15 and 18 for each country.
So my output would be a new dataframe, something like:
myDate People Country
2010 45 Australia
2010 10 Canada
2010 24 France
2011 33 Australia
2011 100 Canada
2011 4 France
2012 21 Australia
2012 66 Canada
2012 211 France
Any help would be greatly appreciated as I'm struggling with this and have looked for answers but I can't find a solution that involves the date and factor.
Upvotes: 4
Views: 486
Reputation: 43169
You could achieve it with lubridate
and dplyr
. Use year()
to grab the year and group by the year and the country. The last step is a conditional summing up:
library(dplyr)
library(lubridate)
mydf %>%
group_by(year = year(Date), Country) %>%
summarise(p = sum(between(People, 15, 18)))
year Country p
<dbl> <fct> <int>
1 2010. Australia 22
2 2010. Canada 34
3 2010. France 26
4 2011. Australia 21
5 2011. Canada 30
6 2011. France 13
7 2012. Australia 28
8 2012. Canada 31
9 2012. France 23
Upvotes: 3
Reputation: 263451
And here is the requisite base solution. Key points: convert dates to character year values with format.Date
and the by-grouping needs to be a list-object:
aggregate( mydf['People'], list(mydf[['Country']], format(mydf$Date, "%Y") ),
FUN=function(d) sum( d >=15 & d <=18) )
Group.1 Group.2 People
1 Australia 2010 25
2 Canada 2010 22
3 France 2010 24
4 Australia 2011 27
5 Canada 2011 19
6 France 2011 33
7 Australia 2012 19
8 Canada 2012 33
9 France 2012 24
If you want the resulting dataframe to have different column names then add those to the list inside the by-group definition:
aggregate( mydf['People'], list(Cntry=mydf[['Country']], Yr=format(mydf$Date, "%Y") ),
function(d) sum( d >=15 & d <=18) )
Cntry Yr People
1 Australia 2010 25
2 Canada 2010 22
3 France 2010 24
4 Australia 2011 27
5 Canada 2011 19
6 France 2011 33
7 Australia 2012 19
8 Canada 2012 33
9 France 2012 24
Upvotes: 3
Reputation: 107687
Consider base R aggregation:
mydf$Year <- format(mydf$Date, "%Y")
mydf$NumberTime15_18 <- ifelse(mydf$People >= 15 & mydf$People <= 18, 1, 0)
aggregate(NumberTime15_18 ~ Country + Year, mydf, sum)
# Country Year NumberTime15_18
# 1 Australia 2010 22
# 2 Canada 2010 17
# 3 France 2010 28
# 4 Australia 2011 26
# 5 Canada 2011 24
# 6 France 2011 20
# 7 Australia 2012 16
# 8 Canada 2012 27
# 9 France 2012 21
Upvotes: 1
Reputation: 1709
For a data.table
solution:
library(data.table)
setDT(mydf)[,(People=sum(between(People, 15, 18))), by = .(year(Date), Country)]
year Country V1
1: 2010 Canada 22
2: 2010 Australia 17
3: 2010 France 22
4: 2011 Canada 23
5: 2011 France 22
6: 2011 Australia 26
7: 2012 Canada 21
8: 2012 France 29
9: 2012 Australia 26
Upvotes: 2