William Schumann
William Schumann

Reputation: 21

Struggling to mutate columns elegantly in tidyverse

I am working with PUMS data and I am using tidycensus to translate the values of certain columns. However, this adds a new column with structure: columnname_label. I want to use mutate to replace the original column with the new translated values in columnname_label. Here is example of my data frame:

df <-data.frame(Region= c(1,2,1,4,3,1), 
                Region_label = c("North", "South", "North", "West", "East", "North"),
                Broadband = c(0,1,0,0,0,1)
                Broadband_label = c("No","Yes","No","No","No","Yes")
                Hispeed = c(1,1,0,0,1,0)
                Hispeed_label = c("Yes", "Yes","No","No","Yes","No")


I know I can write out code like this using tidycensus:

library(tidyverse)

recode <- df %>% mutate(Region = Region_label) %>% mutate(Broadband = Broadband_label) %>% 
          mutate(Hispeed = Hispeed_label)

However, I have 66 columns that need to be matched to the "_label" columns. Is there a more elegant way to perform this instead of writing 66 mutate statements?

I have tried writing a loop using mutate_at, but it does not work.

subset1 <- grep('*label*',names(df),value = TRUE) 

name <- names(df)

for (i in subset1) {
  

final <- final_house %>% mutate_at(vars(matches(trimws(name,"right","\\_label"))),i) 

}

This does not work.

Thank you!

Upvotes: 2

Views: 110

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269481

Using across this assigns to each column whose name is X the column whose name is X_label. If there can be columns that do not have a corresponding _label column then replace the first argument of across with sub("_label$", "", subset1) where subset1 is defined in the question.

 df %>% 
   mutate(across(!ends_with("_label"), 
                 ~ cur_data()[[paste0(cur_column(), "_label")]]))

giving:

  Region Region_label Broadband Broadband_label Hispeed Hispeed_label
1  North        North        No              No     Yes           Yes
2  South        South       Yes             Yes     Yes           Yes
3  North        North        No              No      No            No
4   West         West        No              No      No            No
5   East         East        No              No     Yes           Yes
6  North        North       Yes             Yes      No            No

Note that the above is easy to do with just base R:

replace(df, sub("_label$", "", subset1), df[subset1])

or adding a pipeline:

df %>% replace(sub("_label$", "", subset1), .[subset1])

Upvotes: 1

LMc
LMc

Reputation: 18632

library(dplyr)

df %>% 
  select(ends_with("label")) %>% 
  rename_with(~ gsub("_label","", .))

Output

  Region Broadband Hispeed
1  North        No     Yes
2  South       Yes     Yes
3  North        No      No
4   West        No      No
5   East        No     Yes
6  North       Yes      No

Upvotes: 1

j3ypi
j3ypi

Reputation: 1557

First, select all columns, which end with "label". Then you can extract the string until the first underscore (_) using a reprex.

df %>% 
  select(ends_with("label")) %>% 
  rename_with(~ str_extract(.x, "^[^_]+(?=_)"))

Note that you need dplyr version 1.0.0 or newer for that.

Upvotes: 1

Related Questions