Reputation: 617
I have a large data frame where some rows have repeated values in some of their columns. I want to keep the repeated values and sum those which are different. Below there is a sample of my data:
data<-data.frame(season=c(2008,2009,2010,2011,2011,2012,2000,2001),
lic=c(132228,140610,149215,158559,158559,944907,37667,45724),
client=c(174,174,174,174,174,174,175,175),
qtty=c(31,31,31,31,31,31,36,26),
held=c(60,65,58,68,68,70,29,23),
catch=c(7904,6761,9236,9323.2,801,NA,2330,3594.5),
potlift=c(2715,2218,3000,3887,750,NA,2314,3472))
.
season lic client qtty held catch potlift
2008 132228 174 31 60 7904 2715
2009 140610 174 31 65 6761 2218
2010 149215 174 31 58 9236 3000
2011 158559 174 31 68 9323.2 3887
2011 158559 174 31 68 801 750
2012 944907 174 31 70 NA NA
2000 37667 175 36 29 2330 2314
2001 45724 175 26 23 3594.5 3472
Note that the season 2011 is repeated, each variable (client... held
), except catch
and potlift
. I need to keep the values of (client... held
) and sum catch
and potlift
; therefore my new data frame should be like the example below:
season lic client qtty held catch potlift
2008 132228 174 31 60 7904 2715
2009 140610 174 31 65 6761 2218
2010 149215 174 31 58 9236 3000
2011 158559 174 31 68 10124.2 4637
2012 944907 174 31 70 NA NA
2000 37667 175 36 29 2330 2314
2001 45724 175 26 23 3594.5 3472
I have attempted to do so using aggregate
, but this function sum everything. Any help will be appreciated.
Upvotes: 1
Views: 129
Reputation: 193517
aggregate
seems to work fine for me, but I'm not sure what you were trying:
> aggregate(cbind(catch, potlift) ~ ., data, sum, na.action = "na.pass")
season lic client qtty held catch potlift
1 2001 45724 175 26 23 3594.5 3472
2 2000 37667 175 36 29 2330.0 2314
3 2010 149215 174 31 58 9236.0 3000
4 2008 132228 174 31 60 7904.0 2715
5 2009 140610 174 31 65 6761.0 2218
6 2011 158559 174 31 68 10124.2 4637
7 2012 944907 174 31 70 NA NA
Here, use cbind
to identify the columns that you want to aggregate by. You can then specify all the other columns, or just use .
to indicate "use all other columns not mentioned in the cbind
call.
Upvotes: 2
Reputation: 44525
data$catch <- with(data, ave(catch,list(lic,client,qtty,held),FUN=sum))
data$potlift <- with(data, ave(potlift,list(lic,client,qtty,held),FUN=sum))
unique(data)
season lic client qtty held catch potlift
1 2008 132228 174 31 60 7904.0 2715
2 2009 140610 174 31 65 6761.0 2218
3 2010 149215 174 31 58 9236.0 3000
4 2011 158559 174 31 68 10124.2 4637
6 2012 944907 174 31 70 NA NA
7 2000 37667 175 36 29 2330.0 2314
8 2001 45724 175 26 23 3594.5 3472
Upvotes: 2