Jazzmatazz
Jazzmatazz

Reputation: 645

Split DF into several columns

I have a column that I'm looking to split into several columns. I'm not that familiar with regexp, so I'm not sure of the right way to go about this.

Sample data

df <- tibble::tribble(
                  ~player,
    "Eloy Jimenez OF CHW",
  "Fernando Tatis Jr SS SD"
  )

I'm looking to split the column where the Caps start. For example:

output_df <- tibble::tribble(
  ~col1, col2, col3,
  "Eloy Jimenez", "OF", "CHW",
  "Fernando Tatis Jr", "SS", "SD"
)

Thanks in advance.

Upvotes: 1

Views: 68

Answers (3)

mnist
mnist

Reputation: 6954

You can use lookarounds:

library(stringr)
str_split(string = "Eloy Jimenez OF CHW",
          pattern =  "( ?=([:upper:]{2,}))") %>% 
  unlist() %>% 
  trimws() %>% 
  stri_remove_empty()

Upvotes: 1

Frank Zhang
Frank Zhang

Reputation: 1688

Or using data.table

library(data.table)
df <- tibble::tribble(
  ~player,
  "Eloy Jimenez OF CHW",
  "Fernando Tatis Jr SS SD"
)

setDT(df)

df[,tstrsplit(player,split=" (?=[A-Z]{2,})",perl=TRUE)]
#>                   V1 V2  V3
#> 1:      Eloy Jimenez OF CHW
#> 2: Fernando Tatis Jr SS  SD

Or tidyr

tidyr::separate(df,player,sep=" (?=[A-Z]{2,})",into=paste0("V",1:3))

Upvotes: 1

akrun
akrun

Reputation: 887881

We can use extract from tidyr to capture the upper case characters from the end of the string

library(stringr)
library(tidyr)
df %>% 
   extract(player, into = str_c('col', 1:3), '^(.*)\\s+([A-Z]+)\\s+([A-Z]+)$')
# A tibble: 2 x 3
#  col1              col2  col3 
#  <chr>             <chr> <chr>
#1 Eloy Jimenez      OF    CHW  
#2 Fernando Tatis Jr SS    SD   

Or with strcapture from base R

strcapture('^(.*)\\s+([A-Z]+)\\s+([A-Z]+)$', df$player,
   data.frame(col1 = character(), col2 = character(), col3 = character()))
#               col1 col2 col3
#1      Eloy Jimenez   OF  CHW
#2 Fernando Tatis Jr   SS   SD

Upvotes: 2

Related Questions