Reputation: 466
I have a dataset that contains Congressional members name followed by their state and district number. Ideally I would like to split the string into new columns containing the representatives name, state, and district #. I can successfully split one but since the strings are different lengths it will not match other strings. Below is a reproducable sample.
current_data <- tibble(
names = c("Ralph Abraham La. 5", "Robert B. Aderholt Ala. 4", "Rick W. Allen Ga. 12", "Mark Amodei Nev. 2",
"Kelly Armstrong N.D. 0", "Jodey Arrington Tex. 19"),
party = c("R", "R", "R","R", "R", "R"),
vote = c("N","N","N","N","N","N"))
Here is a sample of what I would like it to look like.
desired_data <- tibble(
names = c("Ralph Abraham", "Robert B. Aderholt", "Rick W. Allen", "Mark Amodei",
"Kelly Armstrong", "Jodey Arrington"),
state = c("La.", "Ala.", "Ga.", "Nev.", "N.D.", "Tex."),
district_num = c(5,4,12,2,0,19),
party = c("R", "R", "R","R", "R", "R"),
vote = c("N","N","N","N","N","N"))
Hope y'all can help me out. Thank you!
Upvotes: 1
Views: 50
Reputation: 79338
current_data%>%
separate(names,c("names","state","district"),"\\s(?=\\S+\\s+\\d)|\\s+(?=\\d)")
# A tibble: 6 x 5
names state district party vote
<chr> <chr> <chr> <chr> <chr>
1 Ralph Abraham La. 5 R N
2 Robert B. Aderholt Ala. 4 R N
3 Rick W. Allen Ga. 12 R N
4 Mark Amodei Nev. 2 R N
5 Kelly Armstrong N.D. 0 R N
6 Jodey Arrington Tex. 19 R N
Upvotes: 1