SiH
SiH

Reputation: 1546

Split a column in a tibble into multiple columns to clean the data

I have data in untidy form. One of the columns contains combined information -

I wish to create a separate column for each.

Below is an example with the desired result.

library(tidyverse)


tbl <- tibble(
  variable = c("a01_name", "a02_address", "a3_phone_number",
                "b1a_total_income", "b01b_no_of_cars")
)

# desired result
desired_tbl <- tibble(
  section = c("a", "a", "a", "b", "b"),
  question_no = c(1, 2, 3, 1, 1),
  sub_question_no = c(NA, NA, NA, "a", "b"),
  question = c("name", "address", "phone_number", "total_income", "no_of_cars")
)


Upvotes: 1

Views: 148

Answers (1)

akrun
akrun

Reputation: 887118

We can use extract

library(tidyr)
library(dplyr)
extract(tbl, variable, into = c("section", "question_no", "sub_question_no", 
   "question"), "^(\\D+)(\\d+)([a-z]+)?_(.*)", convert = TRUE) %>%
  na_if("")

-output

# A tibble: 5 × 4
  section question_no sub_question_no question    
  <chr>         <int> <chr>           <chr>       
1 a                 1 <NA>            name        
2 a                 2 <NA>            address     
3 a                 3 <NA>            phone_number
4 b                 1 a               total_income
5 b                 1 b               no_of_cars  

Upvotes: 2

Related Questions