Chris Ewanik
Chris Ewanik

Reputation: 69

Split a data frame string based on a \

I am trying to split a string column in my data frame into two different columns based on a . My data looks as such :

     Rk Player  Season   Age Tm    Lg       WS     G    GS    MP    FG   FGA  `2P` `2PA`  `3P` `3PA`    FT   FTA   ORB   DRB   TRB   AST   STL   BLK   TOV    PF
  <dbl> <chr>   <chr>  <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1 "LeBro~ 2010-~    26 MIA   NBA    15.6    79    79  3063   758  1485   666  1206    92   279   503   663    80   510   590   554   124    50   284   163
2     2 "Pau G~ 2010-~    30 LAL   NBA    14.7    82    82  3037   593  1120   592  1117     1     3   354   430   268   568   836   273    48   130   142   203

The column I have tried to manipulate is the 'Player' column. The entries for that column look like such.

"LeBron James\\jamesle01"        "Pau Gasol\\gasolpa01"           "Dwight Howard\\howardw01"

I want it split into two columns pName and pID

pName            pID

LeBron James    jamesle01
Pau Gasol       gasolpa01
Dwight Howard   howardw01

I have tried using gsub, sub, str_replace, and separate and have not been able to figure it out

player_stats2010 %>%
+   str_replace(Player,"\\\.*","")
player_stats2010 %>%
+   sub("\\\", "", player_stats2010$Player)
player_stats2010 %>%
+   sub("\.*", "", player_stats2010$Player)
player_stats2010 %>%
+   gsub("\.*", "", player_stats2010$Player)
player_stats2010_test <- player_salaries_2010 %>%
+   separate(Player, c("pName", "pID"), "\")

I really do not understand the syntax for this question despite looking online and at several other questions. If you could please help me understand what I do not understand, that would be awesome. Thank you so much :)

Upvotes: 0

Views: 52

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269714

1) read.table Using x from the Note at the end and only base R use read.table as shown:

read.table(text = x, sep = "\\", col.names = c("pName", "pID"))

giving:

          pName       pID
1  LeBron James jamesle01
2     Pau Gasol gasolpa01
3 Dwight Howard howardw01

2) tidyr

With tidyr we could do this:

library(tidyr)
data.frame(x) %>%
  separate(x, c("pName", "pID"), sep = r"{\\}")

Note

The input is assumed to be:

x <- c("LeBron James\\jamesle01", "Pau Gasol\\gasolpa01", "Dwight Howard\\howardw01")

Upvotes: 3

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

You can use str_extractfrom the package stringr as well as character classes that do not include \\:

library(stringr)
player_stats$pName <- str_extract(player_stats$Player, "^[\\w\\s]+")
player_stats$pID <- str_extract(player_stats$Player, "[\\w\\s]+$")

In both cases you define a character class allowing only letters (\\w) and whitespace chars (\\s) to occur, the difference between the two variables being that pName looks for that pattern from the string beginning (^) while pID looks for it from the end of the string ($).

Result:

player_stats
                    Player         pName       pID
1  LeBron James\\jamesle01  LeBron James jamesle01
2     Pau Gasol\\gasolpa01     Pau Gasol gasolpa01
3 Dwight Howard\\howardw01 Dwight Howard howardw01

Data:

player_stats <- data.frame(
  Player = c("LeBron James\\jamesle01", "Pau Gasol\\gasolpa01", "Dwight Howard\\howardw01"))

Upvotes: 1

Related Questions