Evan
Evan

Reputation: 1499

How can I sum up columns based on a separate column variable?

I have a data frame like:

        date    county      state score1 score2  FIPS
1 2018-01-21 Snohomish Washington     2      1 53061
2 2018-01-22 Snohomish Washington     1      0 53061
3 2018-01-23 Snohomish Washington     7      4 53061
4 2018-01-24      Cook   Illinois     1      0 17031
5 2018-01-24      Cook   Illinois     1      0 17031
6 2018-01-22      West   Louisiana    3      4  1008

I would like to tally up the scores for different counties by matching them up by their FIPS. For example I want to sum up score1 of Snohomish county by using FIPS 53061 and then having 2+7+1 have a score1 for FIPS 53061 be 10. Score2 would equal 5.

I would like the final data frame to look like:

      county      state    score1 score2  FIPS
    Snohomish  Washington    10     5   53061
    Cook       Illinois      2      0   17031
    West       Louisiana     3      4   1008

How can this be done?

Upvotes: 0

Views: 49

Answers (2)

akrun
akrun

Reputation: 886938

We can group by 'county', 'state', 'FIPS' and get the sum of columns that starts with 'score' in column name

library(dplyr)
df1 %>%
    group_by(county, state, FIPS) %>% 
    summarise_at(vars(starts_with('score')), sum)
# A tibble: 3 x 5
# Groups:   county, state [3]
#  county    state       FIPS score1 score2
#  <chr>     <chr>      <int>  <int>  <int>
#1 Cook      Illinois   17031      2      0
#2 Snohomish Washington 53061     10      5
#3 West      Louisiana   1008      3      4

Or using data.table

library(data.table)
setDT(df1)[, lapply(.SD, sum), .(county, state, FIPS),
        .SDcols = startsWith(names(df1), "score")]
#      county      state  FIPS score1 score2
#1: Snohomish Washington 53061     10      5
#2:      Cook   Illinois 17031      2      0
#3:      West  Louisiana  1008      3      4

Or with aggregate from base R

aggregate(.~ county + state + FIPS, df1[setdiff(names(df1), 'date')], sum)
#     county      state  FIPS score1 score2
#1      West  Louisiana  1008      3      4
#2      Cook   Illinois 17031      2      0
#3 Snohomish Washington 53061     10      5

data

df1 <- structure(list(date = c("2018-01-21", "2018-01-22", "2018-01-23", 
"2018-01-24", "2018-01-24", "2018-01-22"), county = c("Snohomish", 
"Snohomish", "Snohomish", "Cook", "Cook", "West"), state = c("Washington", 
"Washington", "Washington", "Illinois", "Illinois", "Louisiana"
), score1 = c(2L, 1L, 7L, 1L, 1L, 3L), score2 = c(1L, 0L, 4L, 
0L, 0L, 4L), FIPS = c(53061L, 53061L, 53061L, 17031L, 17031L, 
1008L)), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6"))

Upvotes: 1

arg0naut91
arg0naut91

Reputation: 14764

Try:

library(dplyr)

df %>%
  group_by(county, state, FIPS) %>%
  summarise_at(vars(starts_with('score')), sum)

Output:

# A tibble: 3 x 5
# Groups:   county, state [3]
  county    state       FIPS score1 score2
  <fct>     <fct>      <int>  <int>  <int>
1 Cook      Illinois   17031      2      0
2 Snohomish Washington 53061     10      5
3 West      Louisiana   1008      3      4

Or in data.table:

library(data.table)

setDT(df)[, lapply(.SD, sum), by = .(county, state, FIPS), .SDcols = c('score1', 'score2')]

Upvotes: 2

Related Questions