Reputation: 98
I am relatively new to R and trying to build a population pyramid. I need to have the population data for Males and Females side-by-side in two variables (popMale, pop female). Currently Sex is a factor with 2 levels. How do I convert these 2-factor levels to 2 new variables(popMale, popFemale). I would appreciate any help. Here is a dput snippet of my data:
structure(list(V1 = c("Location", "Dominican Republic", "Dominican Republic",
"Dominican Republic", "Dominican Republic"), V2 = c("Sex", "Female",
"Female", "Male", "Male"), V3 = c("Age", "0-4", "5-9", "0-4",
"5-9"), V4 = c(1950L, 217L, 164L, 223L, 167L), V5 = c(1955L,
277L, 199L, 286L, 204L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))
Upvotes: 0
Views: 448
Reputation: 124983
As your data contains the column names in the first row, the first step to achieve your desired result would be to name your data according to the first row and drop it afterwards. After doing so convert your data to long or tidy format, i.e. move the years and population numbers in separate columns using e.g. tidyr::pivot_longer
. Finally, you could use tidyr::pivot_wider
to spread the data for males and females in separate columns.
Note: Depending on the next steps in your analysis the last step isn't really needed and may actually complicate plotting a population pyramid.
names(df) <- as.character(df[1,])
df <- df[-1,]
library(tidyr)
df %>%
pivot_longer(matches("^\\d+"), names_to = "Year", values_to = "pop") %>%
pivot_wider(names_from = Sex, values_from = pop, names_glue = "pop{Sex}")
#> # A tibble: 4 × 5
#> Location Age Year popFemale popMale
#> <chr> <chr> <chr> <int> <int>
#> 1 Dominican Republic 0-4 1950 217 223
#> 2 Dominican Republic 0-4 1955 277 286
#> 3 Dominican Republic 5-9 1950 164 167
#> 4 Dominican Republic 5-9 1955 199 204
Upvotes: 0