Reputation: 33
I'm working with an events dataset and need help in creating a new df by summing a specific variable based on certain conditions.
For example, lets say I had a dataset of all cars sold in a county with the name of the dealership, the month the car was sold, the year the car was sold, and the number of cars sold for the past n years. I want to create a new df where each row would present the number of cars sold by a particular dealership at the year level.
In other words, I want to go from something like this:
Dealership Month Year # of Cars
Bobs April 2016 12
Toms March 2016 8
Bobs July 2016 20
Toms June 2016 4
...
To
Dealership Month Year # of Cars
Bobs ? 2016 32
Toms ? 2016 12
...
I'm not sure if that will give me an error because the month data (or other columns in a bigger dataset) will be different. I just don't need that information.
Can anyone help? Many thanks.
Upvotes: 0
Views: 217
Reputation: 3947
We can only do so much without a reproducible example, but this is probably covered by dplyr
library(dplyr)
yourdata %>% group_by(Dealership, Year) %>% summarise(Ncars = sum(`# of Cars`))
Upvotes: 1