hirshg
hirshg

Reputation: 155

Splitting string on a delimiter

I have a data frame with rows that look like this:

Rank..Player      Pos Team PosRank
1. Le'Veon Bell    RB  PIT     RB1
2. Todd Gurley II  RB  LAR     RB2

The issue is that the numbers and names in the first column are one string, and some names have periods in them, making it somewhat trickier to split the two:

18. A.J. Green  WR  CIN  WR7

All solutions I've seen involve splitting strings that contain only numbers and letters. I need a way to split the first column in a way that won't split names like the one above.

Here is the code I used to scrape the data from ESPN:

df <- read_html("http://www.espn.com/fantasy/football/story/_/page/
      18RanksPreseason300nonPPR/2018-fantasy-football-non-ppr-rankings-top-300")

ranks <- df %>%
  html_nodes("table.inline-table") %>%
  .[[2]] %>%
  html_table()

Upvotes: 1

Views: 100

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522741

Here is one option using strsplit:

df <- data.frame(x <- "2. Todd Gurley II", stringsAsFactors=FALSE)
out <- strsplit(df$x, "(?<=\\d)\\.\\s+", perl=TRUE)
df <- data.frame(df, do.call(rbind, out))
names(df) <- c("RankPlayer", "Rank", "Player")
df

         RankPlayer Rank         Player
1 2. Todd Gurley II    2 Todd Gurley II

Demo

Upvotes: 2

akrun
akrun

Reputation: 887901

We can use sub to create a delimiter and then separate into two column with read.csv

tmp <- read.csv(text=sub("^(\\d+)\\.\\s+(.*)", "\\1,\\2", 
             ranks[[1]]), header = FALSE, col.names =c("Rank", "Player"))
ranks1 <- cbind(tmp, ranks[-1])
head(ranks1, 2)
#   Rank         Player Pos Team PosRank
#1    1   Le'Veon Bell  RB  PIT     RB1
#2    2 Todd Gurley II  RB  LAR     RB2

Or with separate

library(tidyr)
separate(ranks, `Rank, Player`, into = c("Rank", "Player"), sep="(?<=[0-9])\\.")

EDIT: Based on @AndS comments

Upvotes: 2

Related Questions