Reputation: 1817

Conditional Calculation for a Column in R

I have the following data:

pop.2017 <- c(434,346,345,357)
pop.2018 <- c(334,336,325,345)
pop.2019 <- c(477,346,145,345)
pop.2020 <- c(474,366,341,300)


total <- c(34,36,34,35)


incident_month_yr <- c("2017-2","2017-5","2018-2","2019-2")

df <- data.frame(incident_month_yr,pop.2017,pop.2018,pop.2019,pop.2020,total)

df['perc'] <- NA

For rows where incident_month_yr contains 2017, I want perc to equal total/pop.2017 For rows where incident_month_yr contains 2018, I want perc to equal total/pop.2018 For rows where incident_month_yr contains 2019, I want perc to equal total/pop.2019 For rows where incident_month_yr contains 2020, I want perc to equal total/pop.2020

I've tried this:

df$perc[grepl(2017,df$incident_month_yr)] <- df$total/df$pop.2017
df$perc[grepl(2018,df$incident_month_yr)] <- df$total/df$pop.2018
df$perc[grepl(2019,df$incident_month_yr)] <- df$total/df$pop.2019
df$perc[grepl(2020,df$incident_month_yr)] <- df$total/df$pop.2020

However, it's not applying the calculations to specific rows like I want. How can I do this?

Upvotes: 2

Answers (3)

Anoushiravan R

Reputation: 21938

You can use the following solution:

library(dplyr)
library(stringr)


df %>% 
  mutate(perc = ifelse(str_detect(incident_month_yr, "2017"), total/pop.2017, 
                       ifelse(str_detect(incident_month_yr, "2018"), total/pop.2018, 
                       total/pop.2019)))


  incident_month_yr pop.2017 pop.2018 pop.2019 pop.2020 total       perc
1            2017-2      434      334      477      474    34 0.07834101
2            2017-5      346      336      346      366    36 0.10404624
3            2018-2      345      325      145      341    34 0.10461538
4            2019-2      357      345      345      300    35 0.10144928

Special Thanks to dear @akrun We can also replace str_detect with grepl function from base R to use fewer packages and use case_when in place of ifelse as an unnested alternative.

df %>%
  mutate(perc = case_when(
    grepl("2017", incident_month_yr) ~ total/pop.2017,
    grepl("2018", incident_month_yr) ~ total/pop.2018,
    TRUE ~ total/pop.2019
  ))

  incident_month_yr pop.2017 pop.2018 pop.2019 pop.2020 total       perc
1            2017-2      434      334      477      474    34 0.07834101
2            2017-5      346      336      346      366    36 0.10404624
3            2018-2      345      325      145      341    34 0.10461538
4            2019-2      357      345      345      300    35 0.10144928

Upvotes: 3

shosaco

Reputation: 6165

Here are two approaches, one in base R and one using tidy data. The provided data is not tidy, that's why base R looks uncomfortable:

# Define the target
target <- c(0.07834101, 0.10404624, 0.10461538, 0.10144928)

That is our goal, calculating target.

First, use base R and ifelse:

result1 <- with(df, 
                ifelse(grepl(2017, incident_month_yr),
                       total/pop.2017,
                       ifelse(grepl(2018, incident_month_yr),
                              total/pop.2018,
                              ifelse(grepl(2019, incident_month_yr),
                                     total/pop.2019,
                                     ifelse(grepl(2020, incident_month_yr),
                                            total/pop.2020,
                                            NA)))))

identical(round(result1, 4), round(target, 4))
#> [1] TRUE

And, the tidy way, reshaping into tidy data and calculating the result:

library(dplyr)
library(tidyr)

result2 <- df %>% pivot_longer(starts_with("pop."), names_to = "pop", names_prefix = "pop.") %>% 
  filter(substr(incident_month_yr, 1, 4) == pop) %>% 
  mutate(perc = total/value) %>% 
  pull(perc)

identical(round(result2, 4), round(target, 4))
#> [1] TRUE

Upvotes: 0

akrun

Reputation: 887851

We can do this with match. Get the column names that have 'pop' substring ('nm1)', remove the characters that are not year from 'incident_month_yr', and the column name, use match to return the column index, cbind with the sequence of rows, extract the values from the 'pop' columns, divide by 'total' and assign it to 'perc' column

nm1 <- grep('pop', names(df), value = TRUE)
nm2 <- trimws(df$incident_month_yr, whitespace = '-.*')
nm3 <- trimws(nm1, whitespace = 'pop\\.')
df$perc <- df$total/df[nm1][cbind(seq_len(nrow(df)), match(nm2, nm3))]
df$perc
#[1] 0.07834101 0.10404624 0.10461538 0.10144928

In dplyr, an option is do rowwise, construct the column name from the 'incident_month_yr' with str_replace to capture the year part, append the 'pop.' as prefix, get the value and divide with 'total' column

library(stringr)
library(dplyr)
df %>%
   rowwise %>%
   mutate(perc = total/get(str_replace(incident_month_yr,
           "(\\d{4})-\\d+", 'pop.\\1'))) %>%
   ungroup

-output

# A tibble: 4 x 7
#  incident_month_yr pop.2017 pop.2018 pop.2019 pop.2020 total   perc
#  <chr>                <dbl>    <dbl>    <dbl>    <dbl> <dbl>  <dbl>
#1 2017-2                 434      334      477      474    34 0.0783
#2 2017-5                 346      336      346      366    36 0.104 
#3 2018-2                 345      325      145      341    34 0.105 
#4 2019-2                 357      345      345      300    35 0.101

Upvotes: 2

Conditional Calculation for a Column in R

Answers (3)

Related Questions