Reputation: 125
Hi I am new to R and would like to get some advice on how to perform sum calculation in data frame structure.
year value
Row 1 2001 10
Row 2 2001 20
Row 3 2002 15
Row 4 2002 NA
Row 5 2003 5
How can I use R to return the total sum value by year? Many thanks!
year sum value
Row 1 2001 30
Row 2 2002 15
Row 3 2003 5
Upvotes: 0
Views: 208
Reputation: 99371
There is also rowsum
, which is quite efficient
with(mydf, rowsum(value, year, na.rm=TRUE))
# [,1]
# 2001 30
# 2002 15
# 2003 5
Or tapply
with(mydf, tapply(value, year, sum, na.rm=TRUE))
# 2001 2002 2003
# 30 15 5
Or as.data.frame(xtabs(...))
as.data.frame(xtabs(mydf[2:1]))
# year Freq
# 1 2001 30
# 2 2002 15
# 3 2003 5
Upvotes: 2
Reputation: 4537
LyzandeR has provided a working answer in base R. If you want to use dplyr
which is a great data management tool you could do:
year <- c(2001,2001,2002,2002,2003)
value <- c(10,20,15,NA,5)
mydf<-data.frame(year,value)
mydf %>%
group_by(year) %>%
summarise(sum_values = sum(value,na.rm=T))
The advantage of dplyr
in this case is for larger datasets it will be much, much faster than base R. I also believe it's much more readable.
Upvotes: 1
Reputation: 37889
There are lots of ways to do that. One of them is using the function aggregate like this:
year <- c(2001,2001,2002,2002,2003)
value <- c(10,20,15,NA,5)
mydf<-data.frame(year,value)
mytable <- aggregate(mydf$value, by=list(year), FUN=sum, na.rm=TRUE)
colnames(mytable) <- c('Year','sum_values')
> mytable
Year sum_values
1 2001 30
2 2002 15
3 2003 5
This link might also be helpful.
Upvotes: 2