ls22
ls22

Reputation: 23

Splitting one Column into two based upon when a specific character shows up

I have a dataset that includes a column with the following information:

Kay Ivey (R)

Mike Dunleavy (R)

Doug Ducey (R)...

Basically, the name of a Governor with the political party in parentheses next to it. How can I split the columnn into two with the name in one column and the Political Party designation in another. I have tried using the separate() function, but cannot figure out how to accomplish this goal.

Upvotes: 0

Views: 163

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389135

Using str_match from stringr :

stringr::str_match(df$name, '(.*?)\\s*\\((.*)\\)')

#     [,1]                [,2]            [,3]
#[1,] "Kay Ivey (R)"      "Kay Ivey"      "R" 
#[2,] "Mike Dunleavy (R)" "Mike Dunleavy" "R" 
#[3,] "Doug Ducey (R)"    "Doug Ducey"    "R" 

We capture the name of the governor in the first capture group and name of the political party which is in parenthesis in another capture group.


Using base R, extracting separately with similar regex :

transform(df, governor = sub('(.*?)\\s*\\(.*', '\\1', name),
              party = sub('.*\\((.*)\\)', '\\1', name))

#              name      governor party
#1      Kay Ivey (R)      Kay Ivey     R
#2 Mike Dunleavy (R) Mike Dunleavy     R
#3    Doug Ducey (R)    Doug Ducey     R

Upvotes: 0

akrun
akrun

Reputation: 887511

We can use extract to capture substring as a group ((...)) to create two columns i.e. capture all characters until the (, then capture the letter as second group

library(tidyr)
extract(df1, name, into = c("name", "designation"), "(.*)\\(([^)]+)\\)")
#            name designation
#1      Kay Ivey            R
#2 Mike Dunleavy            R
#3    Doug Ducey            R

Or with separate by specifying the sep as zero or more spaces (\\s*) followed by the bracket (\\( - escaped as it is a metacharacter) or the \\) and specify to "drop" the extra column (otherwise there would be a friendly warning)

df1 %>% 
  separate(name, into = c('name', 'designation'),
          sep="\\s*\\(|\\)$", extra = "drop")
 #          name designation
 #1      Kay Ivey           R
 #2 Mike Dunleavy           R
 #3    Doug Ducey           R

Or in base R with read.csv after creating a delimiter while replacing the () using gsub

read.csv(text = gsub("\\s\\(([^)]+)\\)", ",\\1", df1$name), 
     header = FALSE, col.names = c('name', 'designation'))
#          name designation
#1      Kay Ivey           R
#2 Mike Dunleavy           R
#3    Doug Ducey           R

data

df1 <- structure(list(name = c("Kay Ivey (R)", "Mike Dunleavy (R)", 
"Doug Ducey (R)")), class = "data.frame", row.names = c(NA, -3L
))

Upvotes: 1

Related Questions