Reputation: 69
I am trying to split a string column in my data frame into two different columns based on a . My data looks as such :
Rk Player Season Age Tm Lg WS G GS MP FG FGA `2P` `2PA` `3P` `3PA` FT FTA ORB DRB TRB AST STL BLK TOV PF
<dbl> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 "LeBro~ 2010-~ 26 MIA NBA 15.6 79 79 3063 758 1485 666 1206 92 279 503 663 80 510 590 554 124 50 284 163
2 2 "Pau G~ 2010-~ 30 LAL NBA 14.7 82 82 3037 593 1120 592 1117 1 3 354 430 268 568 836 273 48 130 142 203
The column I have tried to manipulate is the 'Player' column. The entries for that column look like such.
"LeBron James\\jamesle01" "Pau Gasol\\gasolpa01" "Dwight Howard\\howardw01"
I want it split into two columns pName and pID
pName pID
LeBron James jamesle01
Pau Gasol gasolpa01
Dwight Howard howardw01
I have tried using gsub
, sub
, str_replace
, and separate
and have not been able to figure it out
player_stats2010 %>%
+ str_replace(Player,"\\\.*","")
player_stats2010 %>%
+ sub("\\\", "", player_stats2010$Player)
player_stats2010 %>%
+ sub("\.*", "", player_stats2010$Player)
player_stats2010 %>%
+ gsub("\.*", "", player_stats2010$Player)
player_stats2010_test <- player_salaries_2010 %>%
+ separate(Player, c("pName", "pID"), "\")
I really do not understand the syntax for this question despite looking online and at several other questions. If you could please help me understand what I do not understand, that would be awesome. Thank you so much :)
Upvotes: 0
Views: 52
Reputation: 269714
1) read.table Using x
from the Note at the end and only base R use read.table
as shown:
read.table(text = x, sep = "\\", col.names = c("pName", "pID"))
giving:
pName pID
1 LeBron James jamesle01
2 Pau Gasol gasolpa01
3 Dwight Howard howardw01
2) tidyr
With tidyr we could do this:
library(tidyr)
data.frame(x) %>%
separate(x, c("pName", "pID"), sep = r"{\\}")
The input is assumed to be:
x <- c("LeBron James\\jamesle01", "Pau Gasol\\gasolpa01", "Dwight Howard\\howardw01")
Upvotes: 3
Reputation: 21400
You can use str_extract
from the package stringr
as well as character classes that do not include \\
:
library(stringr)
player_stats$pName <- str_extract(player_stats$Player, "^[\\w\\s]+")
player_stats$pID <- str_extract(player_stats$Player, "[\\w\\s]+$")
In both cases you define a character class allowing only letters (\\w
) and whitespace chars (\\s
) to occur, the difference between the two variables being that pName
looks for that pattern from the string beginning (^
) while pID
looks for it from the end of the string ($
).
Result:
player_stats
Player pName pID
1 LeBron James\\jamesle01 LeBron James jamesle01
2 Pau Gasol\\gasolpa01 Pau Gasol gasolpa01
3 Dwight Howard\\howardw01 Dwight Howard howardw01
Data:
player_stats <- data.frame(
Player = c("LeBron James\\jamesle01", "Pau Gasol\\gasolpa01", "Dwight Howard\\howardw01"))
Upvotes: 1