DiamondJoe12
DiamondJoe12

Reputation: 1809

Conditional Calculation for a Column in R

I have the following data:

pop.2017 <- c(434,346,345,357)
pop.2018 <- c(334,336,325,345)
pop.2019 <- c(477,346,145,345)
pop.2020 <- c(474,366,341,300)


total <- c(34,36,34,35)


incident_month_yr <- c("2017-2","2017-5","2018-2","2019-2")

df <- data.frame(incident_month_yr,pop.2017,pop.2018,pop.2019,pop.2020,total)

df['perc'] <- NA

For rows where incident_month_yr contains 2017, I want perc to equal total/pop.2017 For rows where incident_month_yr contains 2018, I want perc to equal total/pop.2018 For rows where incident_month_yr contains 2019, I want perc to equal total/pop.2019 For rows where incident_month_yr contains 2020, I want perc to equal total/pop.2020

I've tried this:

df$perc[grepl(2017,df$incident_month_yr)] <- df$total/df$pop.2017
df$perc[grepl(2018,df$incident_month_yr)] <- df$total/df$pop.2018
df$perc[grepl(2019,df$incident_month_yr)] <- df$total/df$pop.2019
df$perc[grepl(2020,df$incident_month_yr)] <- df$total/df$pop.2020

However, it's not applying the calculations to specific rows like I want. How can I do this?

Upvotes: 2

Views: 72

Answers (3)

Anoushiravan R
Anoushiravan R

Reputation: 21908

You can use the following solution:

library(dplyr)
library(stringr)


df %>% 
  mutate(perc = ifelse(str_detect(incident_month_yr, "2017"), total/pop.2017, 
                       ifelse(str_detect(incident_month_yr, "2018"), total/pop.2018, 
                       total/pop.2019)))


  incident_month_yr pop.2017 pop.2018 pop.2019 pop.2020 total       perc
1            2017-2      434      334      477      474    34 0.07834101
2            2017-5      346      336      346      366    36 0.10404624
3            2018-2      345      325      145      341    34 0.10461538
4            2019-2      357      345      345      300    35 0.10144928

Special Thanks to dear @akrun We can also replace str_detect with grepl function from base R to use fewer packages and use case_when in place of ifelse as an unnested alternative.

df %>%
  mutate(perc = case_when(
    grepl("2017", incident_month_yr) ~ total/pop.2017,
    grepl("2018", incident_month_yr) ~ total/pop.2018,
    TRUE ~ total/pop.2019
  ))

  incident_month_yr pop.2017 pop.2018 pop.2019 pop.2020 total       perc
1            2017-2      434      334      477      474    34 0.07834101
2            2017-5      346      336      346      366    36 0.10404624
3            2018-2      345      325      145      341    34 0.10461538
4            2019-2      357      345      345      300    35 0.10144928

Upvotes: 3

shosaco
shosaco

Reputation: 6155

Here are two approaches, one in base R and one using tidy data. The provided data is not tidy, that's why base R looks uncomfortable:

# Define the target
target <- c(0.07834101, 0.10404624, 0.10461538, 0.10144928)

That is our goal, calculating target.

First, use base R and ifelse:

result1 <- with(df, 
                ifelse(grepl(2017, incident_month_yr),
                       total/pop.2017,
                       ifelse(grepl(2018, incident_month_yr),
                              total/pop.2018,
                              ifelse(grepl(2019, incident_month_yr),
                                     total/pop.2019,
                                     ifelse(grepl(2020, incident_month_yr),
                                            total/pop.2020,
                                            NA)))))

identical(round(result1, 4), round(target, 4))
#> [1] TRUE

And, the tidy way, reshaping into tidy data and calculating the result:

library(dplyr)
library(tidyr)

result2 <- df %>% pivot_longer(starts_with("pop."), names_to = "pop", names_prefix = "pop.") %>% 
  filter(substr(incident_month_yr, 1, 4) == pop) %>% 
  mutate(perc = total/value) %>% 
  pull(perc)

identical(round(result2, 4), round(target, 4))
#> [1] TRUE

Upvotes: 0

akrun
akrun

Reputation: 886938

We can do this with match. Get the column names that have 'pop' substring ('nm1)', remove the characters that are not year from 'incident_month_yr', and the column name, use match to return the column index, cbind with the sequence of rows, extract the values from the 'pop' columns, divide by 'total' and assign it to 'perc' column

nm1 <- grep('pop', names(df), value = TRUE)
nm2 <- trimws(df$incident_month_yr, whitespace = '-.*')
nm3 <- trimws(nm1, whitespace = 'pop\\.')
df$perc <- df$total/df[nm1][cbind(seq_len(nrow(df)), match(nm2, nm3))]
df$perc
#[1] 0.07834101 0.10404624 0.10461538 0.10144928

In dplyr, an option is do rowwise, construct the column name from the 'incident_month_yr' with str_replace to capture the year part, append the 'pop.' as prefix, get the value and divide with 'total' column

library(stringr)
library(dplyr)
df %>%
   rowwise %>%
   mutate(perc = total/get(str_replace(incident_month_yr,
           "(\\d{4})-\\d+", 'pop.\\1'))) %>%
   ungroup

-output

# A tibble: 4 x 7
#  incident_month_yr pop.2017 pop.2018 pop.2019 pop.2020 total   perc
#  <chr>                <dbl>    <dbl>    <dbl>    <dbl> <dbl>  <dbl>
#1 2017-2                 434      334      477      474    34 0.0783
#2 2017-5                 346      336      346      366    36 0.104 
#3 2018-2                 345      325      145      341    34 0.105 
#4 2019-2                 357      345      345      300    35 0.101 

Upvotes: 2

Related Questions