Clatty Cake
Clatty Cake

Reputation: 739

Split String In R Based On Character Location

I'm trying to split these strings in R (column entries) into three separate columns:

João Moutinho Monaco, 30,  M(C) 
Clinton N'Jie Marseille, 23,  FW
Frederic Sammaritano Dijon, 30,  AM(LR)

to become

Player                Team           Pos
João Moutinho         Monaco         30,  M(C) 
Clinton N'Jie         Marseille      23,  FW
Frederic Sammaritano  Dijon          30,  AM(LR)

I can find the location of the characters using gregexpr and nchar, but but I'm not sure how to use strsplit for it. Or maybe another package is easier?

Upvotes: 1

Views: 738

Answers (2)

akrun
akrun

Reputation: 887048

We can read the vectors in to a data.frame with read.csv after creating a delimiter using gsub

read.csv(text=gsub("^(\\S+\\s+\\S+)\\s+(\\S+),\\s+(.*)", 
       "\\1;\\2;\\3", v1), sep=";", header=FALSE, 
       col.names = c("Player", "Team", "Pos"), stringsAsFactors=FALSE)
#                Player      Team         Pos
#1        João Moutinho    Monaco   30,  M(C)
#2        Clinton N'Jie Marseille     23,  FW
#3 Frederic Sammaritano     Dijon 30,  AM(LR)

Update

If we have more patterns and the "Team" names have only a single word (i.e. before the first ',')

read.csv(text= sub("(\\s+[A-Za-z]+),(\\s+\\d+),(.*)", ";\\1;\\2\\3", v2), 
      header=FALSE, sep=";", col.names = c("Player", "Team", "Pos"), stringsAsFactors=FALSE)
#                Player       Team         Pos
#1        João Moutinho     Monaco    30  M(C)
#2        Clinton N'Jie  Marseille      23  FW
#3 Frederic Sammaritano      Dijon  30  AM(LR)
#4       Angel Di María        PSG   28 M(CLR)
#5    Jean Michael Seri       Nice     25 M(C)

data

v1 <- c("João Moutinho Monaco, 30,  M(C)", "Clinton N'Jie Marseille, 23,  FW", 
                    "Frederic Sammaritano Dijon, 30,  AM(LR)")
v2 <- c(v1, "Angel Di María PSG, 28, M(CLR)","Jean Michael Seri Nice, 25, M(C)")

Upvotes: 2

Sotos
Sotos

Reputation: 51582

The word approach from stringr,

library(stringr)
data.frame(Player = word(v1, 1, 2), 
             Team = sub(',','' ,word(v1, 3)), 
              Pos = word(v1, 4, 6), stringsAsFactors = FALSE)

#                Player      Team         Pos
#1        João Moutinho    Monaco   30,  M(C)
#2        Clinton N'Jie Marseille     23,  FW
#3 Frederic Sammaritano     Dijon 30,  AM(LR)

Upvotes: 1

Related Questions