Dinho
Dinho

Reputation: 724

How to use dplyr to subset dataframe based on group_by in R

I have a dataframe that consists of sales and population data. Reference below:

Location Sales Population Month
A         10       480     Jan
B         12       480     Jan
C         14       480     Jan 
A         13       480     Jan
B         11       480     Jan
C         16       480     Jan
A         12       480     Jan
B         10       480     Jan
C         14       480     Jan

What I would like to do is use dplyr to group by month (only showing Janurary but goes to Dec) for sum of sales and the month's population.

I get the sales with this line of code by my population comes out as NA..

test2 <- df_2019 %>% group_by(Month) %>% summarize(SumSales = sum(Total_Sales, na.rm = TRUE), Pop_Sum = sum(Population, na.rm = TRUE))

Month
SumSales
Pop_Sum
1   Apr 285591.9    134786490
2   Aug 384246.5    131901771
3   Dec 254748.9    89512147
4   Feb 251463.7    135634878
5   Jan 243624.6    135901304
6   Jul 286468.8    134335668
7   Jun 283395.2    134335668
8   Mar 289453.8    135658132
9   May 365272.2    134768586
10  Nov 291248.8    89576444
11  Oct 375402.2    89589288
12  Sep 290888.5    132878020

DESIRED OUTPUT would look like this:

Month
SumSales
Pop_Sum
1   Apr 285591.9    437
2   Aug 384246.5    440
3   Dec 254748.9    443
4   Feb 251463.7    435
5   Jan 243624.6    480
6   Jul 286468.8    455
7   Jun 283395.2    465
8   Mar 289453.8    460
9   May 365272.2    479
10  Nov 291248.8    435
11  Oct 375402.2    444
12  Sep 290888.5    451

Where Month Population has mutliple rows with the same value but sales are unique. Any help would be very helpful!

Upvotes: 0

Views: 59

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389235

Since the population values are already calculated we can take any population value for each month. For example, taking the 1st value of Population, we can do

library(dplyr)

df_2019 %>% 
  group_by(Month) %>% 
  summarize(SumSales = sum(Total_Sales, na.rm = TRUE), 
            Pop_Sum = first(Population))

Upvotes: 1

Related Questions