teogj
teogj

Reputation: 343

Separate column by matching strings

I have a dataframe looking like more or less like this:

name_position
RAHEEM STERLINGForward
MARCUS RASHFORDForward
JORDAN HENDERSONMidfielder
JORDAN PICKFORDGoalkeeper
KYLE WALKERDefender

My purpose is to create two columns of this previous one, so I've created a vector containing all the available positions

positions <- c("Goalkeeper", "Defender", "Midfielder", "Forward")

Then I've been trying with functions such as separate(), extract() or even str_match, but I'm not being able to get the output I desire to reach, which would look this way:

name                   position
RAHEEM STERLING        Forward
MARCUS RASHFORD        Forward
JORDAN HENDERSON       Midfielder
JORDAN PICKFORD        Goalkeeper
KYLE WALKER            Defender

Upvotes: 2

Views: 46

Answers (2)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

Use str_extractfrom stringr:

df1$position <- str_extract(df1$name_position, "(?<=[A-Z])[A-Z][a-z]+")

Result:

df1
               name_position   position
1     RAHEEM STERLINGForward    Forward
2     MARCUS RASHFORDForward    Forward
3 JORDAN HENDERSONMidfielder Midfielder
4  JORDAN PICKFORDGoalkeeper Goalkeeper
5        KYLE WALKERDefender   Defender

This solution uses positive lookbehind:

(?<=[A-Z]) if you see an upper-case letter on left ...

[A-Z][a-z]+ ... match the subsequent upper-case letter plus the one or more lower-case letters following it

Upvotes: 2

akrun
akrun

Reputation: 886938

We can use separate with a regex lookaround

library(dplyr)
library(tidyr)
df1 %>%
  separate(name_position, into = c("name", "position"), 
           sep="(?<=[A-Z])(?=[A-Z][a-z])")
#             name   position
#1  RAHEEM STERLING    Forward
#2  MARCUS RASHFORD    Forward
#3 JORDAN HENDERSON Midfielder
#4  JORDAN PICKFORD Goalkeeper
#5      KYLE WALKER   Defender

If we have a custom vector, then one option is to create a pattern vector by creating a single string

library(stringr)
pat <- str_c(positions, collapse="|")
df1 %>% 
   transmute(name = str_remove(name_position, pat),
            position = str_extract(name_position, pat))

data

df1 <- structure(list(name_position = c("RAHEEM STERLINGForward", "MARCUS RASHFORDForward", 
"JORDAN HENDERSONMidfielder", "JORDAN PICKFORDGoalkeeper", 
"KYLE WALKERDefender"
)), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 2

Related Questions