Reputation: 645
I have a column that I'm looking to split into several columns. I'm not that familiar with regexp, so I'm not sure of the right way to go about this.
Sample data
df <- tibble::tribble(
~player,
"Eloy Jimenez OF CHW",
"Fernando Tatis Jr SS SD"
)
I'm looking to split the column where the Caps start. For example:
output_df <- tibble::tribble(
~col1, col2, col3,
"Eloy Jimenez", "OF", "CHW",
"Fernando Tatis Jr", "SS", "SD"
)
Thanks in advance.
Upvotes: 1
Views: 68
Reputation: 6954
You can use lookarounds:
library(stringr)
str_split(string = "Eloy Jimenez OF CHW",
pattern = "( ?=([:upper:]{2,}))") %>%
unlist() %>%
trimws() %>%
stri_remove_empty()
Upvotes: 1
Reputation: 1688
Or using data.table
library(data.table)
df <- tibble::tribble(
~player,
"Eloy Jimenez OF CHW",
"Fernando Tatis Jr SS SD"
)
setDT(df)
df[,tstrsplit(player,split=" (?=[A-Z]{2,})",perl=TRUE)]
#> V1 V2 V3
#> 1: Eloy Jimenez OF CHW
#> 2: Fernando Tatis Jr SS SD
Or tidyr
tidyr::separate(df,player,sep=" (?=[A-Z]{2,})",into=paste0("V",1:3))
Upvotes: 1
Reputation: 887881
We can use extract
from tidyr
to capture the upper case characters from the end of the string
library(stringr)
library(tidyr)
df %>%
extract(player, into = str_c('col', 1:3), '^(.*)\\s+([A-Z]+)\\s+([A-Z]+)$')
# A tibble: 2 x 3
# col1 col2 col3
# <chr> <chr> <chr>
#1 Eloy Jimenez OF CHW
#2 Fernando Tatis Jr SS SD
Or with strcapture
from base R
strcapture('^(.*)\\s+([A-Z]+)\\s+([A-Z]+)$', df$player,
data.frame(col1 = character(), col2 = character(), col3 = character()))
# col1 col2 col3
#1 Eloy Jimenez OF CHW
#2 Fernando Tatis Jr SS SD
Upvotes: 2