Reputation: 115
I have a dataframe with two columns (year and precipitation). In a single column, the year is listed such that it starts from 1900 and ends at 2014 and again starts with 1900. In another column I have precipitation value of the respective year. Now i want to add all the precipitation of 1900 as 1 value and 1901 as 1 to up to 2014. My data looks like:
Year Precipitation
1900 4.826
1901 37.592
2014 14.224
1900 45.974
1901 46.228
2014 79.502
1900 52.578
1901 22.30
2014 15.25
The results should look like:
Year Precipitation
1900 103.378
1901 106.12
2014 108.976
So far I wrote a code but it does not work, if anybody can fix it?
data=read.table('precipitation.csv',header=T,sep=',')
frame=data.frame(data)
cumcum=tapply(frame$Precipitation, cumsum(frame$year==1), FUN=sum, na.rm=TRUE)
Thanks
Upvotes: 5
Views: 9117
Reputation: 3279
1 liner -- try:
aggregate(frame['Precipitation'], by=frame['Year'], sum)
Reference: Consolidate duplicate rows
Upvotes: 5
Reputation: 4126
Try data.table
library(data.table)
frame=fread('precipitation.csv',header=TRUE,sep=',')
frame[, sum(Precipitation), by = Year]
Upvotes: 4
Reputation: 1929
That seems overly complicated. Why not just do the sums separately?
s.1900 <- sum(frame$Precipitation[frame$year == 1900])
s.1901 <- sum(frame$Precipitation[frame$year >= 1901 & frame$year <= 2013])
s.2014 <- sum(frame$Precipitation[frame$year == 2014])
It actually leaves your code readable for later.
Upvotes: 2