Reputation: 1809
I have the following data:
pop.2017 <- c(434,346,345,357)
pop.2018 <- c(334,336,325,345)
pop.2019 <- c(477,346,145,345)
pop.2020 <- c(474,366,341,300)
total <- c(34,36,34,35)
incident_month_yr <- c("2017-2","2017-5","2018-2","2019-2")
df <- data.frame(incident_month_yr,pop.2017,pop.2018,pop.2019,pop.2020,total)
df['perc'] <- NA
For rows where incident_month_yr contains 2017, I want perc
to equal total/pop.2017
For rows where incident_month_yr contains 2018, I want perc
to equal total/pop.2018
For rows where incident_month_yr contains 2019, I want perc
to equal total/pop.2019
For rows where incident_month_yr contains 2020, I want perc
to equal total/pop.2020
I've tried this:
df$perc[grepl(2017,df$incident_month_yr)] <- df$total/df$pop.2017
df$perc[grepl(2018,df$incident_month_yr)] <- df$total/df$pop.2018
df$perc[grepl(2019,df$incident_month_yr)] <- df$total/df$pop.2019
df$perc[grepl(2020,df$incident_month_yr)] <- df$total/df$pop.2020
However, it's not applying the calculations to specific rows like I want. How can I do this?
Upvotes: 2
Views: 72
Reputation: 21908
You can use the following solution:
library(dplyr)
library(stringr)
df %>%
mutate(perc = ifelse(str_detect(incident_month_yr, "2017"), total/pop.2017,
ifelse(str_detect(incident_month_yr, "2018"), total/pop.2018,
total/pop.2019)))
incident_month_yr pop.2017 pop.2018 pop.2019 pop.2020 total perc
1 2017-2 434 334 477 474 34 0.07834101
2 2017-5 346 336 346 366 36 0.10404624
3 2018-2 345 325 145 341 34 0.10461538
4 2019-2 357 345 345 300 35 0.10144928
Special Thanks to dear @akrun
We can also replace str_detect
with grepl
function from base R to use fewer packages and use case_when
in place of ifelse
as an unnested alternative.
df %>%
mutate(perc = case_when(
grepl("2017", incident_month_yr) ~ total/pop.2017,
grepl("2018", incident_month_yr) ~ total/pop.2018,
TRUE ~ total/pop.2019
))
incident_month_yr pop.2017 pop.2018 pop.2019 pop.2020 total perc
1 2017-2 434 334 477 474 34 0.07834101
2 2017-5 346 336 346 366 36 0.10404624
3 2018-2 345 325 145 341 34 0.10461538
4 2019-2 357 345 345 300 35 0.10144928
Upvotes: 3
Reputation: 6155
Here are two approaches, one in base R and one using tidy data. The provided data is not tidy, that's why base R looks uncomfortable:
# Define the target
target <- c(0.07834101, 0.10404624, 0.10461538, 0.10144928)
That is our goal, calculating target
.
First, use base R and ifelse
:
result1 <- with(df,
ifelse(grepl(2017, incident_month_yr),
total/pop.2017,
ifelse(grepl(2018, incident_month_yr),
total/pop.2018,
ifelse(grepl(2019, incident_month_yr),
total/pop.2019,
ifelse(grepl(2020, incident_month_yr),
total/pop.2020,
NA)))))
identical(round(result1, 4), round(target, 4))
#> [1] TRUE
And, the tidy way, reshaping into tidy data and calculating the result:
library(dplyr)
library(tidyr)
result2 <- df %>% pivot_longer(starts_with("pop."), names_to = "pop", names_prefix = "pop.") %>%
filter(substr(incident_month_yr, 1, 4) == pop) %>%
mutate(perc = total/value) %>%
pull(perc)
identical(round(result2, 4), round(target, 4))
#> [1] TRUE
Upvotes: 0
Reputation: 886938
We can do this with match
. Get the column names that have 'pop' substring ('nm1)', remove the characters that are not year from 'incident_month_yr', and the column name, use match
to return the column index, cbind
with the sequence of rows, extract the values from the 'pop' columns, divide by 'total' and assign it to 'perc' column
nm1 <- grep('pop', names(df), value = TRUE)
nm2 <- trimws(df$incident_month_yr, whitespace = '-.*')
nm3 <- trimws(nm1, whitespace = 'pop\\.')
df$perc <- df$total/df[nm1][cbind(seq_len(nrow(df)), match(nm2, nm3))]
df$perc
#[1] 0.07834101 0.10404624 0.10461538 0.10144928
In dplyr
, an option is do rowwise
, construct the column name from the 'incident_month_yr' with str_replace
to capture the year part, append the 'pop.' as prefix, get
the value and divide with 'total' column
library(stringr)
library(dplyr)
df %>%
rowwise %>%
mutate(perc = total/get(str_replace(incident_month_yr,
"(\\d{4})-\\d+", 'pop.\\1'))) %>%
ungroup
-output
# A tibble: 4 x 7
# incident_month_yr pop.2017 pop.2018 pop.2019 pop.2020 total perc
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2017-2 434 334 477 474 34 0.0783
#2 2017-5 346 336 346 366 36 0.104
#3 2018-2 345 325 145 341 34 0.105
#4 2019-2 357 345 345 300 35 0.101
Upvotes: 2