Reputation: 83
Hy Everyone,
I used to tidy my data using SPSS and I'm trying to change to R.
I have a data frame with women's birth histories for several years and I need to create and add new variables using them. Basically, I have variables for each year from pa2010 to pa1996 which are the number of kids that a woman had at the beginning of the year. Those variables are numeric and I want to mutate other ones named ppa2010 to ppa1996 as factors, and in same time define levels and labels. I did all this once but I've typed each variable and each condition. During the last days, I was trying to insert loops to make my code better, but without success.
pa2010 <- c(0, 0, 2, 5, 3, 6, 8, 2, 1, 1)
pa2009 <- c(0, 0, 2, 4, 3, 6, 8 ,2, 0, 0)
pa2008 <- c(0, 0, 1, 4, 3, 5, 8, 1, 0, 0)
Parity <- data.frame(pa2010, pa2009, pa2008)
##I've been creating like this...
Parity %>% mutate(ppa2010 = ifelse(pa2010 >= 7, 7, pa2010),
ppa2009 = ifelse(pa2009 >= 7, 7, pa2009),
ppa2008 = ifelse(pa2008 >= 7, 7, pa2008)) %>%
mutate_(.vars = vars(ppa2010, ppa2009, ppa2008),
.funs = factor((levels = c(0, 1, 2, 3, 4, 5, 6, 7),
labels = c("Parity 0", "Parity 1", "Parity 2", "Parity 3", "Parity 4", "Parity 5", "Parity 6", "Parity 7+")))
I would like to create new variables using loops or some function that make things faster. Also, expand the data frame, add these variables as factors, because in the future I will need to create bar charts to analyze and this process will be repeated for or five times.
Upvotes: 1
Views: 230
Reputation: 15072
If you want to avoid reshaping your data, which isn't always easy in this format, you can use the _at
functions in dplyr
. The key thing to know about mutate_at
which I think is not super obvious is that you can use it to produce new columns with a consistent naming style. So we can do:
Use mutate_at
to truncate the pa
columns if women had more than 7 children. This basically means: do this function that replaces values large than 7 with 7 on each column that starts with "pa". The ~
syntax is a compact way to describe a temporary function in purrr
and dplyr
.
Use mutate_at
again, but this time with the function as a named element of a list, and with a function that makes a function with the right levels and labels. The name will be appended to the original column names with an underscore separator. Note that we can use str_c
to avoid typing out every level manually.
We want the columns to read ppa
instead of having this suffix, so we can use rename_at
to rename them all. First we remove the suffix and then add p
to the beginning.
P.S. You may eventually find it easier to "tidy" your data so that each row is a woman-year, instead of a woman, however.
library(tidyverse)
pa2010 <- c(0, 0, 2, 5, 3, 6, 8, 2, 1, 1)
pa2009 <- c(0, 0, 2, 4, 3, 6, 8 ,2, 0, 0)
pa2008 <- c(0, 0, 1, 4, 3, 5, 8, 1, 0, 0)
Parity <- data.frame(pa2010, pa2009, pa2008)
Parity %>%
mutate_at(
.vars = vars(starts_with("pa")),
.funs = ~ if_else(. >= 7, 7, .)
) %>%
mutate_at(
.vars = vars(starts_with("pa")),
.funs = list(
parity = ~ . %>%
factor(levels = 0:7, labels = str_c("Parity ", 0:7)) %>%
fct_recode("Parity 7+" = "Parity 7")
)
) %>%
rename_at(
.vars = vars(ends_with("_parity")),
.funs = . %>%
str_remove("_parity") %>%
str_c("p", .)
)
#> pa2010 pa2009 pa2008 ppa2010 ppa2009 ppa2008
#> 1 0 0 0 Parity 0 Parity 0 Parity 0
#> 2 0 0 0 Parity 0 Parity 0 Parity 0
#> 3 2 2 1 Parity 2 Parity 2 Parity 1
#> 4 5 4 4 Parity 5 Parity 4 Parity 4
#> 5 3 3 3 Parity 3 Parity 3 Parity 3
#> 6 6 6 5 Parity 6 Parity 6 Parity 5
#> 7 7 7 7 Parity 7+ Parity 7+ Parity 7+
#> 8 2 2 1 Parity 2 Parity 2 Parity 1
#> 9 1 0 0 Parity 1 Parity 0 Parity 0
#> 10 1 0 0 Parity 1 Parity 0 Parity 0
Created on 2019-03-22 by the reprex package (v0.2.1)
Upvotes: 1