Reputation: 1546
I have data in untidy form. One of the columns contains combined information -
I wish to create a separate column for each.
Below is an example with the desired result.
library(tidyverse)
tbl <- tibble(
variable = c("a01_name", "a02_address", "a3_phone_number",
"b1a_total_income", "b01b_no_of_cars")
)
# desired result
desired_tbl <- tibble(
section = c("a", "a", "a", "b", "b"),
question_no = c(1, 2, 3, 1, 1),
sub_question_no = c(NA, NA, NA, "a", "b"),
question = c("name", "address", "phone_number", "total_income", "no_of_cars")
)
Upvotes: 1
Views: 148
Reputation: 887118
We can use extract
library(tidyr)
library(dplyr)
extract(tbl, variable, into = c("section", "question_no", "sub_question_no",
"question"), "^(\\D+)(\\d+)([a-z]+)?_(.*)", convert = TRUE) %>%
na_if("")
-output
# A tibble: 5 × 4
section question_no sub_question_no question
<chr> <int> <chr> <chr>
1 a 1 <NA> name
2 a 2 <NA> address
3 a 3 <NA> phone_number
4 b 1 a total_income
5 b 1 b no_of_cars
Upvotes: 2