Reputation: 23
I have a dataset that includes a column with the following information:
Kay Ivey (R)
Mike Dunleavy (R)
Doug Ducey (R)...
Basically, the name of a Governor with the political party in parentheses next to it. How can I split the columnn into two with the name in one column and the Political Party designation in another. I have tried using the separate() function, but cannot figure out how to accomplish this goal.
Upvotes: 0
Views: 163
Reputation: 389135
Using str_match
from stringr
:
stringr::str_match(df$name, '(.*?)\\s*\\((.*)\\)')
# [,1] [,2] [,3]
#[1,] "Kay Ivey (R)" "Kay Ivey" "R"
#[2,] "Mike Dunleavy (R)" "Mike Dunleavy" "R"
#[3,] "Doug Ducey (R)" "Doug Ducey" "R"
We capture the name of the governor in the first capture group and name of the political party which is in parenthesis in another capture group.
Using base R, extracting separately with similar regex :
transform(df, governor = sub('(.*?)\\s*\\(.*', '\\1', name),
party = sub('.*\\((.*)\\)', '\\1', name))
# name governor party
#1 Kay Ivey (R) Kay Ivey R
#2 Mike Dunleavy (R) Mike Dunleavy R
#3 Doug Ducey (R) Doug Ducey R
Upvotes: 0
Reputation: 887511
We can use extract
to capture substring as a group ((...)
) to create two columns i.e. capture all characters until the (
, then capture the letter as second group
library(tidyr)
extract(df1, name, into = c("name", "designation"), "(.*)\\(([^)]+)\\)")
# name designation
#1 Kay Ivey R
#2 Mike Dunleavy R
#3 Doug Ducey R
Or with separate
by specifying the sep
as zero or more spaces (\\s*
) followed by the bracket (\\(
- escaped as it is a metacharacter) or the \\)
and specify to "drop" the extra
column (otherwise there would be a friendly warning)
df1 %>%
separate(name, into = c('name', 'designation'),
sep="\\s*\\(|\\)$", extra = "drop")
# name designation
#1 Kay Ivey R
#2 Mike Dunleavy R
#3 Doug Ducey R
Or in base R
with read.csv
after creating a delimiter while replacing the ()
using gsub
read.csv(text = gsub("\\s\\(([^)]+)\\)", ",\\1", df1$name),
header = FALSE, col.names = c('name', 'designation'))
# name designation
#1 Kay Ivey R
#2 Mike Dunleavy R
#3 Doug Ducey R
df1 <- structure(list(name = c("Kay Ivey (R)", "Mike Dunleavy (R)",
"Doug Ducey (R)")), class = "data.frame", row.names = c(NA, -3L
))
Upvotes: 1