Reputation: 155
I have a data frame with rows that look like this:
Rank..Player Pos Team PosRank
1. Le'Veon Bell RB PIT RB1
2. Todd Gurley II RB LAR RB2
The issue is that the numbers and names in the first column are one string, and some names have periods in them, making it somewhat trickier to split the two:
18. A.J. Green WR CIN WR7
All solutions I've seen involve splitting strings that contain only numbers and letters. I need a way to split the first column in a way that won't split names like the one above.
Here is the code I used to scrape the data from ESPN:
df <- read_html("http://www.espn.com/fantasy/football/story/_/page/
18RanksPreseason300nonPPR/2018-fantasy-football-non-ppr-rankings-top-300")
ranks <- df %>%
html_nodes("table.inline-table") %>%
.[[2]] %>%
html_table()
Upvotes: 1
Views: 100
Reputation: 522741
Here is one option using strsplit
:
df <- data.frame(x <- "2. Todd Gurley II", stringsAsFactors=FALSE)
out <- strsplit(df$x, "(?<=\\d)\\.\\s+", perl=TRUE)
df <- data.frame(df, do.call(rbind, out))
names(df) <- c("RankPlayer", "Rank", "Player")
df
RankPlayer Rank Player
1 2. Todd Gurley II 2 Todd Gurley II
Upvotes: 2
Reputation: 887901
We can use sub
to create a delimiter
and then separate into two column with read.csv
tmp <- read.csv(text=sub("^(\\d+)\\.\\s+(.*)", "\\1,\\2",
ranks[[1]]), header = FALSE, col.names =c("Rank", "Player"))
ranks1 <- cbind(tmp, ranks[-1])
head(ranks1, 2)
# Rank Player Pos Team PosRank
#1 1 Le'Veon Bell RB PIT RB1
#2 2 Todd Gurley II RB LAR RB2
Or with separate
library(tidyr)
separate(ranks, `Rank, Player`, into = c("Rank", "Player"), sep="(?<=[0-9])\\.")
EDIT: Based on @AndS comments
Upvotes: 2