Reputation: 1499
I have a data frame like:
date county state score1 score2 FIPS
1 2018-01-21 Snohomish Washington 2 1 53061
2 2018-01-22 Snohomish Washington 1 0 53061
3 2018-01-23 Snohomish Washington 7 4 53061
4 2018-01-24 Cook Illinois 1 0 17031
5 2018-01-24 Cook Illinois 1 0 17031
6 2018-01-22 West Louisiana 3 4 1008
I would like to tally up the scores for different counties by matching them up by their FIPS. For example I want to sum up score1
of Snohomish county by using FIPS 53061
and then having 2+7+1 have a score1 for FIPS 53061 be 10. Score2
would equal 5.
I would like the final data frame to look like:
county state score1 score2 FIPS
Snohomish Washington 10 5 53061
Cook Illinois 2 0 17031
West Louisiana 3 4 1008
How can this be done?
Upvotes: 0
Views: 49
Reputation: 886938
We can group by 'county', 'state', 'FIPS' and get the sum
of columns that starts with 'score' in column name
library(dplyr)
df1 %>%
group_by(county, state, FIPS) %>%
summarise_at(vars(starts_with('score')), sum)
# A tibble: 3 x 5
# Groups: county, state [3]
# county state FIPS score1 score2
# <chr> <chr> <int> <int> <int>
#1 Cook Illinois 17031 2 0
#2 Snohomish Washington 53061 10 5
#3 West Louisiana 1008 3 4
Or using data.table
library(data.table)
setDT(df1)[, lapply(.SD, sum), .(county, state, FIPS),
.SDcols = startsWith(names(df1), "score")]
# county state FIPS score1 score2
#1: Snohomish Washington 53061 10 5
#2: Cook Illinois 17031 2 0
#3: West Louisiana 1008 3 4
Or with aggregate
from base R
aggregate(.~ county + state + FIPS, df1[setdiff(names(df1), 'date')], sum)
# county state FIPS score1 score2
#1 West Louisiana 1008 3 4
#2 Cook Illinois 17031 2 0
#3 Snohomish Washington 53061 10 5
df1 <- structure(list(date = c("2018-01-21", "2018-01-22", "2018-01-23",
"2018-01-24", "2018-01-24", "2018-01-22"), county = c("Snohomish",
"Snohomish", "Snohomish", "Cook", "Cook", "West"), state = c("Washington",
"Washington", "Washington", "Illinois", "Illinois", "Louisiana"
), score1 = c(2L, 1L, 7L, 1L, 1L, 3L), score2 = c(1L, 0L, 4L,
0L, 0L, 4L), FIPS = c(53061L, 53061L, 53061L, 17031L, 17031L,
1008L)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6"))
Upvotes: 1
Reputation: 14764
Try:
library(dplyr)
df %>%
group_by(county, state, FIPS) %>%
summarise_at(vars(starts_with('score')), sum)
Output:
# A tibble: 3 x 5
# Groups: county, state [3]
county state FIPS score1 score2
<fct> <fct> <int> <int> <int>
1 Cook Illinois 17031 2 0
2 Snohomish Washington 53061 10 5
3 West Louisiana 1008 3 4
Or in data.table
:
library(data.table)
setDT(df)[, lapply(.SD, sum), by = .(county, state, FIPS), .SDcols = c('score1', 'score2')]
Upvotes: 2