Tereza Bernardes
Tereza Bernardes

Reputation: 83

How to add new variables with similar names in a data frame using conditional statements from others?

Hy Everyone,

I used to tidy my data using SPSS and I'm trying to change to R.

I have a data frame with women's birth histories for several years and I need to create and add new variables using them. Basically, I have variables for each year from pa2010 to pa1996 which are the number of kids that a woman had at the beginning of the year. Those variables are numeric and I want to mutate other ones named ppa2010 to ppa1996 as factors, and in same time define levels and labels. I did all this once but I've typed each variable and each condition. During the last days, I was trying to insert loops to make my code better, but without success.

pa2010 <- c(0, 0, 2, 5, 3, 6, 8, 2, 1, 1)
pa2009 <- c(0, 0, 2, 4, 3, 6, 8 ,2, 0, 0)
pa2008 <- c(0, 0, 1, 4, 3, 5, 8, 1, 0, 0)

Parity <- data.frame(pa2010, pa2009, pa2008)

##I've been creating like this... 
Parity %>% mutate(ppa2010 = ifelse(pa2010 >= 7, 7, pa2010),
                  ppa2009 = ifelse(pa2009 >= 7, 7, pa2009),
                  ppa2008 = ifelse(pa2008 >= 7, 7, pa2008)) %>% 
  mutate_(.vars = vars(ppa2010, ppa2009, ppa2008), 
          .funs = factor((levels = c(0, 1, 2, 3, 4, 5, 6, 7), 
         labels = c("Parity 0", "Parity 1", "Parity 2", "Parity 3", "Parity 4", "Parity 5", "Parity 6", "Parity 7+")))

I would like to create new variables using loops or some function that make things faster. Also, expand the data frame, add these variables as factors, because in the future I will need to create bar charts to analyze and this process will be repeated for or five times.

Upvotes: 1

Views: 230

Answers (1)

Calum You
Calum You

Reputation: 15072

If you want to avoid reshaping your data, which isn't always easy in this format, you can use the _at functions in dplyr. The key thing to know about mutate_at which I think is not super obvious is that you can use it to produce new columns with a consistent naming style. So we can do:

  1. Use mutate_at to truncate the pa columns if women had more than 7 children. This basically means: do this function that replaces values large than 7 with 7 on each column that starts with "pa". The ~ syntax is a compact way to describe a temporary function in purrr and dplyr.

  2. Use mutate_at again, but this time with the function as a named element of a list, and with a function that makes a function with the right levels and labels. The name will be appended to the original column names with an underscore separator. Note that we can use str_c to avoid typing out every level manually.

  3. We want the columns to read ppa instead of having this suffix, so we can use rename_at to rename them all. First we remove the suffix and then add p to the beginning.

P.S. You may eventually find it easier to "tidy" your data so that each row is a woman-year, instead of a woman, however.

library(tidyverse)

pa2010 <- c(0, 0, 2, 5, 3, 6, 8, 2, 1, 1)
pa2009 <- c(0, 0, 2, 4, 3, 6, 8 ,2, 0, 0)
pa2008 <- c(0, 0, 1, 4, 3, 5, 8, 1, 0, 0)

Parity <- data.frame(pa2010, pa2009, pa2008)
Parity %>%
  mutate_at(
    .vars = vars(starts_with("pa")),
    .funs = ~ if_else(. >= 7, 7, .)
  ) %>%
  mutate_at(
    .vars = vars(starts_with("pa")),
    .funs = list(
      parity = ~ . %>%
        factor(levels = 0:7, labels = str_c("Parity ", 0:7)) %>%
        fct_recode("Parity 7+" = "Parity 7")
    )
  ) %>%
  rename_at(
    .vars = vars(ends_with("_parity")),
    .funs = . %>%
      str_remove("_parity") %>%
      str_c("p", .)
  )
#>    pa2010 pa2009 pa2008   ppa2010   ppa2009   ppa2008
#> 1       0      0      0  Parity 0  Parity 0  Parity 0
#> 2       0      0      0  Parity 0  Parity 0  Parity 0
#> 3       2      2      1  Parity 2  Parity 2  Parity 1
#> 4       5      4      4  Parity 5  Parity 4  Parity 4
#> 5       3      3      3  Parity 3  Parity 3  Parity 3
#> 6       6      6      5  Parity 6  Parity 6  Parity 5
#> 7       7      7      7 Parity 7+ Parity 7+ Parity 7+
#> 8       2      2      1  Parity 2  Parity 2  Parity 1
#> 9       1      0      0  Parity 1  Parity 0  Parity 0
#> 10      1      0      0  Parity 1  Parity 0  Parity 0

Created on 2019-03-22 by the reprex package (v0.2.1)

Upvotes: 1

Related Questions