Reputation: 739
I'm trying to split these strings in R (column entries) into three separate columns:
João Moutinho Monaco, 30, M(C)
Clinton N'Jie Marseille, 23, FW
Frederic Sammaritano Dijon, 30, AM(LR)
to become
Player Team Pos
João Moutinho Monaco 30, M(C)
Clinton N'Jie Marseille 23, FW
Frederic Sammaritano Dijon 30, AM(LR)
I can find the location of the characters using gregexpr and nchar, but but I'm not sure how to use strsplit for it. Or maybe another package is easier?
Upvotes: 1
Views: 738
Reputation: 887048
We can read the vectors in to a data.frame
with read.csv
after creating a delimiter using gsub
read.csv(text=gsub("^(\\S+\\s+\\S+)\\s+(\\S+),\\s+(.*)",
"\\1;\\2;\\3", v1), sep=";", header=FALSE,
col.names = c("Player", "Team", "Pos"), stringsAsFactors=FALSE)
# Player Team Pos
#1 João Moutinho Monaco 30, M(C)
#2 Clinton N'Jie Marseille 23, FW
#3 Frederic Sammaritano Dijon 30, AM(LR)
If we have more patterns and the "Team" names have only a single word (i.e. before the first ',')
read.csv(text= sub("(\\s+[A-Za-z]+),(\\s+\\d+),(.*)", ";\\1;\\2\\3", v2),
header=FALSE, sep=";", col.names = c("Player", "Team", "Pos"), stringsAsFactors=FALSE)
# Player Team Pos
#1 João Moutinho Monaco 30 M(C)
#2 Clinton N'Jie Marseille 23 FW
#3 Frederic Sammaritano Dijon 30 AM(LR)
#4 Angel Di María PSG 28 M(CLR)
#5 Jean Michael Seri Nice 25 M(C)
v1 <- c("João Moutinho Monaco, 30, M(C)", "Clinton N'Jie Marseille, 23, FW",
"Frederic Sammaritano Dijon, 30, AM(LR)")
v2 <- c(v1, "Angel Di María PSG, 28, M(CLR)","Jean Michael Seri Nice, 25, M(C)")
Upvotes: 2
Reputation: 51582
The word
approach from stringr
,
library(stringr)
data.frame(Player = word(v1, 1, 2),
Team = sub(',','' ,word(v1, 3)),
Pos = word(v1, 4, 6), stringsAsFactors = FALSE)
# Player Team Pos
#1 João Moutinho Monaco 30, M(C)
#2 Clinton N'Jie Marseille 23, FW
#3 Frederic Sammaritano Dijon 30, AM(LR)
Upvotes: 1