Reputation: 177
Hey everyone I have a dataset with about 8 for which I want to calculate the largest volume for each combination of city and year.
The dataset looks like this:
city sales volume year avg price
abilene 239 12313 2000 7879
kansas 2324 18765 2000 2424
nyc 2342 987651 2000 3127
abilene 3432 34342 2001 1234
nyc 2342 10000 2001 3127
kansas 176 3130 2001 879
kansas 123 999650 2002 2424
abilene 3432 34342 2002 1234
nyc 2342 98000 2002 3127
I want my dataset to look like this :
city year volume
nyc 2000 987651
abilene 2001 34342
kansas 2002 999650
I used the ddplyr to find the maximum volume of each city.
newdf=ddply(df,c('city','year'),summarise, max(volume))
However this gives me a dataset with maximum value of each city for each year. However, I just want to know the maximum volume comparing all cities for an year. Thank you.
Upvotes: 3
Views: 60
Reputation: 15458
library(dplyr)
df %>% #df is your dataframe
group_by(year)%>%
filter(volume==max(volume))
Source: local data frame [3 x 5]
Groups: year
city sales volume year avg_price
1 nyc 2342 987651 2000 3127
2 abilene 3432 34342 2001 1234
3 kansas 123 999650 2002 2424
#updated : If you are grouping by both city and year
df %>% #df is your dataframe
group_by(year,city)%>%
filter(volume==max(volume))
Source: local data frame [9 x 5]
Groups: year, city
city sales volume year avg_price
1 abilene 239 12313 2000 7879
2 kansas 2324 18765 2000 2424
3 nyc 2342 987651 2000 3127
4 abilene 3432 34342 2001 1234
5 nyc 2342 10000 2001 3127
6 kansas 176 3130 2001 879
7 kansas 123 999650 2002 2424
8 abilene 3432 34342 2002 1234
9 nyc 2342 98000 2002 3127
Upvotes: 1