Juvin
Juvin

Reputation: 115

sum by year in a row in a dataframe in r

I have a dataframe with two columns (year and precipitation). In a single column, the year is listed such that it starts from 1900 and ends at 2014 and again starts with 1900. In another column I have precipitation value of the respective year. Now i want to add all the precipitation of 1900 as 1 value and 1901 as 1 to up to 2014. My data looks like:

Year    Precipitation

1900    4.826
1901    37.592
2014    14.224
1900    45.974
1901    46.228
2014    79.502
1900    52.578
1901    22.30
2014    15.25

The results should look like:

Year   Precipitation

1900   103.378
1901   106.12
2014   108.976

So far I wrote a code but it does not work, if anybody can fix it?

data=read.table('precipitation.csv',header=T,sep=',')
frame=data.frame(data)
cumcum=tapply(frame$Precipitation, cumsum(frame$year==1), FUN=sum, na.rm=TRUE)

Thanks

Upvotes: 5

Views: 9117

Answers (3)

csiu
csiu

Reputation: 3279

1 liner -- try:

aggregate(frame['Precipitation'], by=frame['Year'], sum)

Reference: Consolidate duplicate rows

Upvotes: 5

vrajs5
vrajs5

Reputation: 4126

Try data.table

library(data.table)
frame=fread('precipitation.csv',header=TRUE,sep=',')    
frame[, sum(Precipitation), by = Year]

Upvotes: 4

LauriK
LauriK

Reputation: 1929

That seems overly complicated. Why not just do the sums separately?

s.1900 <- sum(frame$Precipitation[frame$year == 1900])
s.1901 <- sum(frame$Precipitation[frame$year >= 1901 & frame$year <= 2013])
s.2014 <- sum(frame$Precipitation[frame$year == 2014])

It actually leaves your code readable for later.

Upvotes: 2

Related Questions