Reputation: 326
I currently have a dataframe that looks like this:
census_2
# A tibble: 236,844 x 5
STATE RACE CENSUS2010POP POPESTIMATE2012
<chr> <dbl> <dbl> <dbl>
1 01 1 37991 37824
2 01 1 38150 38597
3 01 1 39738 37653
4 01 1 39827 38113
5 01 1 39353 39628
6 01 1 39520 39821
7 01 1 39813 39425
8 01 2 39695 39636
9 01 2 40012 39957
10 01 2 42073 39856
# … with 236,834 more rows, and 1 more variable:
# POPESTIMATE2016 <dbl>
The State columns are going to be something I want to universally group by. I will have multiple rows representing the same value in the race column. The CENSUS2010POP column is a census population estimate for that particular racial group in that particular state.
So, what I am ultimately wanting to do, is to get a column for each racial group population estimate for the state overall. For example, say in the original dataset, state = 1 & race = 1 refers to a population in State 1 for white constituents. What I want, is to summate the observations in that state for white population estmates and turn that estimate into a column. Then for the third column in the dataset, I'd have a summation of the population estimates in that particular state for non-whites, say it is equivalent to RACE = 2 in the original dataset above. Here is a brief example:
STATE WHITE_CENSUS10POP NONWHITE_CENSUS10POP
1 = summation(nrow where state == 1 & race == 1) = summation(nrow where state == 1 & race == 2)
2. = summation(nrow where state == 2 & race == 1) = summation(nrow where state == 2 & race == 2)
3
4
5
6
...
50
Upvotes: 1
Views: 148
Reputation: 887961
There are multiple ways to do this. A simple option without having other package dependencies (assuming there are only two 'race') is to do a group by on 'STATE' and summarise
with two columns by subsetting the 'CENSUS2010POP', based on the value of 'race' and get the sum
.
library(dplyr)
df1 %>%
group_by(STATE) %>%
summarise(WHITE_CENSUS10POP = sum(CENSUS2010POP[race == 1]),
NONWHITE_CENUS10POP = sum(CENSUS2010POP[race == 2]),
.groups = 'drop')
A more general option is to get the sum
of 'CENSUS2010POP' grouped by both 'STATE', 'race' and then pivot the 'long' format data to 'wide' with pivot_wider
library(tidyr)
df1 %>%
group_by(STATE, race) %>%
summarise(value = sum(CENSUS2010POP), .groups = 'drop') %>%
pivot_wider(names_from = race, values_from = value)
Upvotes: 1