Reputation: 477
I want to split a column in my dataset into multiple columns based on assigned separated sign: "|".
My dataset looks like this:
vname<-c("x1", "x2", "x3","x4")
label<-c("1,Eng |2,Man", "1,yes|2,no|3,dont know", "1,never|2,sometimes|3,usually|4,always", "1,yes|2,No|3,dont know")
df<-data.frame(vname, label)
So, I want to split column: label in to multiple columns based on sign" "|". I used stringr::str_split to do that, and my code is like:
cd2<-df %>%
select(vname, everything())%>%
mutate(label=str_split(value, " \\| "))
However, the result returns a vector in column of label. It looks like this:
vname label
x1 c("1,Eng","2,Man")
x2 c("1,yes","2,no", "3,dont know")
....
My question is that how to get a expected result like this:
vname label1 label2 label3 label4
x1 1,Eng 2,Man
x2 1,yes 2,no 3, dont know
x3 1,never. 2,sometimes, 3,usually. 4,always
...
Thanks a lot for help~~~
dput(head(cd2, 10))
structure(list(variable = c("x2", "x8", "x9", "x10", "x13", "x14",
"x15", "x20", "x22", NA), vname = c("consenting_language", "county",
"respondent", "residence", "language", "int_q1", "int_q2", "int_q4",
"int_q5", "int_q6"), label = c("Consenting Language", "County",
"Respondent Type", "Residence", "Interview language ", "1. What was your sex at birth?",
"2. How would you describe your current sexual orientation?",
"4. What is the highest level of education you completed?", "5. What is your current marital status?",
"<div class=\"rich-text-field-label\"><p>6. Is <span style=\"color: #3598db;\">regular </span>your partner currently living with you now, or does s/he stay elsewhere?</p></div>"
), value = c("1, English | 2, Kiswahili", "1, County011 | 2, County014 | 3, County002| 4, County006 | 5, County010 | 6, County008 | 7, County005 | 8, County003 | 9, County012| 10, County004 | 11, County009 | 12, County001 | 13, County015 | 14, County007 | 15, County012",
"1, FSW | 2, MSM | 3, AGYW", "1, Urban | 2, Peri urban | 3, Rural",
"1, English | 2, Kiswahili", "1, Male | 2, Female", "1, Homosexual/Gay | 2, Bisexual | 3, Heterosexual/Straight | 4, Transgender Male | 5, Transgender Female | 96, Other | 98, Don't Know | 99, Decline to state",
"1,None | 2,Nursery/kindergarten | 3,Primary | 4,Secondary | 5,Tertiary/Vocational | 6,College/University | 7,Adult education | 96,Other",
"1, Single/Not married | 2, Married | 3, Cohabiting | 4, Divorced | 5, Separated | 6, Widowed | 7, In a relationship",
"1, Living with You | 2, Staying Elsewhere")), row.names = c(NA,
10L), class = "data.frame")
Upvotes: 1
Views: 434
Reputation: 19867
you can do this simply by using separate()
from {tidyr}
library(tidyverse)
dat %>% as_tibble() %>%
separate(value, sep = "\\s*\\|\\s*",
into = paste0("value", seq(str_count(.$value, "\\s*\\|\\s*"))))
Upvotes: 3
Reputation: 886978
With the code used, it is returning a list
(perhaps we have to make sure that there are zero or more spaces as in the example there was no space), we can unnest_wider
to convert to new columns
library(dplyr)
library(stringr)
library(tidyr)
df %>%
select(vname, everything())%>%
mutate(label=str_split(label, "\\s*\\|\\s*")) %>%
unnest_wider(where(is.list), names_sep = "")
-output
# A tibble: 4 × 5
vname label1 label2 label3 label4
<chr> <chr> <chr> <chr> <chr>
1 x1 1,Eng 2,Man <NA> <NA>
2 x2 1,yes 2,no 3,dont know <NA>
3 x3 1,never 2,sometimes 3,usually 4,always
4 x4 1,yes 2,No 3,dont know <NA>
This maybe also done with separate
library(tidyr)
df %>%
separate(label, into = str_c('label',
seq_len(max(str_count(.$label, fixed("|"))) + 1)),
sep = "\\|", fill = "right")
-output
vname label1 label2 label3 label4
1 x1 1,Eng 2,Man <NA> <NA>
2 x2 1,yes 2,no 3,dont know <NA>
3 x3 1,never 2,sometimes 3,usually 4,always
4 x4 1,yes 2,No 3,dont know <NA>
Or using the OP's data 'cd2' - added the case for spaces before and after the |
cd2new <- cd2 %>%
separate(value, into = str_c('value',
seq_len(max(str_count(.$value, fixed("|"))) + 1)),
sep = "\\s*\\|\\s*", fill = "right")
-output
> head(cd2new, 2)
variable vname label value1 value2 value3 value4 value5
1 x2 consenting_language Consenting Language 1, English 2, Kiswahili <NA> <NA> <NA>
2 x8 county County 1, County011 2, County014 3, County002 4, County006 5, County010
value6 value7 value8 value9 value10 value11 value12 value13
1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 6, County008 7, County005 8, County003 9, County012 10, County004 11, County009 12, County001 13, County015
value14 value15
1 <NA> <NA>
2 14, County007 15, County012
Upvotes: 6