markpc
markpc

Reputation: 11

R pivot longer with both annual and monthly data

I'm unsure how to structure my pivot longer command when I have both annual and monthly data. For example I have:

wide <- data.frame(region_name = character(),    # Create empty data frame
                    total_population_2019 = numeric(),
                    total_population_2020 = numeric(),
                   mean_temperature_2019_1 = numeric(),
                   mean_temperature_2019_2 = numeric(),
                   mean_temperature_2020_1 = numeric(),
                   mean_temperature_2020_2 = numeric(),
                    stringsAsFactors = FALSE)

wide[1, ] <- list("funville", 50000, 51250, 26.3, 24.6, 25.7, 24.9)

region_name total_population_2019 total_population_2020 mean_temperature_2019_1 mean_temperature_2019_2 mean_temperature_2020_1 mean_temperature_2020_2
 funville                50000             51250              26.3              24.6              25.7              24.9

I'm able to pivot on the monthly columns using spread:

long <- pivot_longer(wide, cols = 4:7, names_to = c("layer" ,"year", "month"),
                     names_pattern = "(.*)_(.*)_?_(.*)") %>%
  group_by(layer) %>%
  mutate(n = 1:n()) %>%
  spread(layer, value) %>%
  select(-n)

which gives

  region_name total_population_2019 total_population_2020 year  month mean_temperature
1 funville                    50000                 51250 2019  1                 26.3
2 funville                    50000                 51250 2019  2                 24.6
3 funville                    50000                 51250 2020  1                 25.7
4 funville                    50000                 51250 2020  2                 24.9

I'd like to now have a population column where the values are attributed for each row/month that falls in that year, ideally would look like:

desired.df <- data.frame(region_name = c("funville", "funville", "funville", "funville"),
                         year = c("2019", "2019", "2020", "2020"),
                         month = c("1", "2", "1", "2"),
                         population = c("50000", "50000", "51250", "51250"),
                         mean_temperature = c("26.3", "24.6", "25.7", "24.9"))

which gives

  region_name year  month population mean_temperature           
1 funville    2019  1     50000      26.3            
2 funville    2019  2     50000      24.6            
3 funville    2020  1     51250      25.7            
4 funville    2020  2     51250      24.9 

Does anyone have a solution? Thanks in advance

Upvotes: 1

Views: 423

Answers (1)

stefan
stefan

Reputation: 124203

One option would be to use the names_pattern argument and the special .value. To make this work I first add a helper month to your population columns. Additionally I use tidyr::fill to fill up the population column:

library(dplyr)
library(tidyr)

wide |> 
  rename_with(~ paste(.x, 1, sep = "_"), starts_with("total")) |> 
  pivot_longer(-region_name, 
               names_to = c(".value", "year", "month"),
               names_pattern = "^(.*?)_(\\d+)_(\\d+)$") |> 
  group_by(year) |> 
  fill(total_population) |> 
  arrange(year)
#> # A tibble: 4 × 5
#> # Groups:   year [2]
#>   region_name year  month total_population mean_temperature
#>   <chr>       <chr> <chr>            <dbl>            <dbl>
#> 1 funville    2019  1                50000             26.3
#> 2 funville    2019  2                50000             24.6
#> 3 funville    2020  1                51250             25.7
#> 4 funville    2020  2                51250             24.9

Upvotes: 1

Related Questions