Julia B
Julia B

Reputation: 13

R: sum row based on several conditions

I am working on my thesis with little knowledge of r, so the answer this question may be pretty obvious.

I have the a dataset looking like this:

county<-c('1001','1001','1001','1202','1202','1303','1303')
naics<-c('423620','423630','423720','423620','423720','423550','423720')
employment<-c(5,6,5,5,5,6,5)
data<-data.frame(county,naics,employment)

For every county, I want to sum the value of employment of rows with naics '423620' and '423720'. (So two conditions: 1. same county code 2. those two naics codes) The row in which they are added should be the first one ('423620'), and the second one ('423720') should be removed

The final dataset should look like this:

county2<-c('1001','1001','1202','1303','1303')
naics2<-c('423620','423630','423620','423550','423720')
employment2<-c(10,6,10,6,5)
data2<-data.frame(county2,naics2,employment2)

I have tried to do it myself with aggregate and rowSum, but because of the two conditions, I have failed thus far. Thank you very much.

Upvotes: 1

Views: 612

Answers (2)

Murray Bozinsky
Murray Bozinsky

Reputation: 499

With such a condition, I'd first write a small helper and then pass it on to dplyr mutate:

# replace 423720 by 423620 only if both exist
onlyThoseNAICS <- function(v){
  if( ("423620" %in% v) & ("423720" %in% v) ) v[v == "423720"] <- "423620"
  v
}

data %>% 
  dplyr::group_by(county) %>% 
  dplyr::mutate(naics = onlyThoseNAICS(naics)) %>% 
  dplyr::group_by(county, naics) %>% 
  dplyr::summarise(employment = sum(employment)) %>% 
  dplyr::ungroup()

Upvotes: 0

akrun
akrun

Reputation: 886938

We can do

library(dplyr)
data$naics <- as.character(data$naics)

data %>%
    filter(naics %in% c(423620, 423720)) %>% group_by(county) %>% 
    summarise(naics = "423620", employment = sum(employment)) %>%
    bind_rows(., filter(data, !naics  %in% c(423620, 423720)))
# A tibble: 5 x 3
#   county  naics employment
#  <fctr>  <chr>      <dbl>
#1   1001 423620         10
#2   1202 423620         10
#3   1303 423620          5
#4   1001 423630          6
#5   1303 423550          6

Upvotes: 1

Related Questions