Dan
Dan

Reputation: 103

How to mutate a ratio for two populations by year

   overseas_domestic_indicator ref_year count
   <chr>                          <dbl> <dbl>
 1 Domestic                        2014 17854
 2 Domestic                        2015 18371
 3 Domestic                        2016 18975
 4 Domestic                        2017 19455
 5 Domestic                        2018 19819
 6 Overseas                        2014  6491
 7 Overseas                        2015  7393
 8 Overseas                        2016  8594
 9 Overseas                        2017  9539
10 Overseas                        2018 10455

This is my data. I want something like:

ref_year Domestic/Overseas
2014    2.75
2015    ...
...     ...

But I don't know how to do this using tidyverse. I tried to use mutate but I don't know how to clarify the count for Domestic and Overseas. Thanks in advance.

Upvotes: 0

Views: 50

Answers (3)

akrun
akrun

Reputation: 887153

We can do a group by 'ref_year' and summarise by dividing the 'count' corresponding to 'Domestic' with that of 'Overseas' and reshape to 'wide' if needed

library(dplyr)
library(tidyr)
df1 %>%
   group_by(ref_year) %>%
   summarise(
      `Domestic/Overseas` = count[overseas_domestic_indicator == 'Domestic']/
      count[overseas_domestic_indicator == 'Overseas']) 
# A tibble: 5 x 2
#  ref_year `Domestic/Overseas`
#     <int>               <dbl>
#1     2014                2.75
#2     2015                2.48
#3     2016                2.21
#4     2017                2.04
#5     2018                1.90

Or arrange first and then do a division

df1 %>%
   arrange(ref_year, overseas_domestic_indicator) %>% 
   group_by(ref_year) %>% 
   summarise( `Domestic/Overseas` = first(count)/last(count))

Or with dcast from data.table

library(data.table)
dcast(setDT(df1), ref_year ~ overseas_domestic_indicator)[, 
      `Domestic/Overseas` := Domestic/Overseas][]

data

df1 <- structure(list(overseas_domestic_indicator = c("Domestic", "Domestic", 
"Domestic", "Domestic", "Domestic", "Overseas", "Overseas", "Overseas", 
"Overseas", "Overseas"), ref_year = c(2014L, 2015L, 2016L, 2017L, 
2018L, 2014L, 2015L, 2016L, 2017L, 2018L), count = c(17854L, 
18371L, 18975L, 19455L, 19819L, 6491L, 7393L, 8594L, 9539L, 10455L
)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10"))

Upvotes: 2

NotThatKindODr
NotThatKindODr

Reputation: 719

This should work too, multiple ways to do this

  df %>%
    pivot_wider(overseas_domestic_indicator, 
                names_from = overseas_domestic_indicator, 
                values_from = count) %>%
    mutate(Ratio = Domestic/Overseas)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

You can get the data in wide format first and then divide Domestic by Overseas

library(dplyr)

df %>%
  tidyr::pivot_wider(names_from = overseas_domestic_indicator, 
                     values_from = count) %>%
   mutate(ratio = Domestic/Overseas)


#  ref_year Domestic Overseas ratio
#     <int>    <int>    <int> <dbl>
#1     2014    17854     6491  2.75
#2     2015    18371     7393  2.48
#3     2016    18975     8594  2.21
#4     2017    19455     9539  2.04
#5     2018    19819    10455  1.90

Upvotes: 2

Related Questions