Reputation: 43
My data frame is:
df <- data.frame(player = c("Taiwo Awoniyi/e5478b87", "Jacob Bruun Larsen/4e204552", "Andi Zeqiri/d01231f0"), goals = c(2,5,7))
I want to remove all numbers after the "/" in the "player" column. To ideally have:
df <- data.frame(player = c("Taiwo Awoniyi", "Jacob Bruun Larsen", "Andi Zeqiri"), goals = c(2,5,7))
I am unsure of how to approach this since player names vary greatly in length and some numbers are larger than others.
Upvotes: 4
Views: 1283
Reputation: 72828
Using base R.
transform(df, player=gsub('/.+', '', player))
# player goals
# 1 Taiwo Awoniyi 2
# 2 Jacob Bruun Larsen 5
# 3 Andi Zeqiri 7
Upvotes: 4
Reputation: 78927
We could use separate
, added extra = 'drop'
(many thanks to Onyambu)
library(dplyr)
library(tidyr)
df %>%
separate(player, "player", sep="/", extra = 'drop')
player goals
1 Taiwo Awoniyi 2
2 Jacob Bruun Larsen 5
3 Andi Zeqiri 7
Upvotes: 5
Reputation: 21400
You can backreference the substring you want to keep by a negative character class allowing any characters except the /
:
df %>%
mutate(player = sub("([^/]+).*", "\\1", player))
player goals
1 Taiwo Awoniyi 2
2 Jacob Bruun Larsen 5
3 Andi Zeqiri 7
More simply, you can just remove anything that's a /
or a digit:
df %>%
mutate(player = gsub("[/0-9]", "", player))
In base R
syntax:
df$player <- gsub("[/0-9]", "", df$player)
Upvotes: 5
Reputation: 13319
Using dplyr
for the pipe and mutate
, we can gsub
everything after /
.
df %>%
mutate(player = gsub("\\/.*", "", player))
player goals
1 Taiwo Awoniyi 2
2 Jacob Bruun Larsen 5
3 Andi Zeqiri 7
Upvotes: 4