lingaaah
lingaaah

Reputation: 31

How to use gsub() to partially match part of a string and replace all strings containing that partial match?

I am trying to look for the word 'dunk' in a column of NBA shot types as there are various types of dunks.

Then replace all the different dunk types with just the word 'dunk'

nbaClean2 <- nbaData %>% 
    select(x, y, points, type, result, team, player) %>% 
    filter(team == 'LAL', y <= 47) %>% 
    na_if("") %>% 
    drop_na() %>% 
    mutate(result = ifelse(result == 'missed', 'FGA', 'FGM')) %>% 
    mutate(dunk = gsub('\bdunk', 'dunk', type))
    mutate(dunk = grepl('dunk', type), gsub('TRUE', 'dunk', dunk))

Essentially trying to see if I can use gsub() to partial match the word 'dunk'.

Trying to find a solution where I don't have to write all different types of shot types down if possible as there is a lot.

Upvotes: 1

Views: 934

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 146239

gsub will only replace the match. You can make the match match the full string by wrapping it in .*, which matches anything.

mutate(dunk = gsub('.*dunk.*', 'dunk', type))

(Note I got rid of your \b word boundary. If you want to use it, you'd need a double backslash ".*\\bdunk.*", but then you wouldn't match anything that doesn't have a word boundary before dunk, e.g., "slamdunk" would not match.)

A potentially more efficient option would be to detect the "dunk" pattern and then replace the whole string without regex, e.g.

mutate(dunk = ifelse(grepl("dunk", type, fixed = TRUE), "dunk", "not dunk"))

It's not clear what value you want in the newly created dunk column when type doesn't include "dunk". I'd consider making it a logical column simply with dunk = grepl("dunk", type). If you post sample input and desired output it much easier to help. Perhaps you don't want a dunk column at all, but just to change type to "dunk" if it includes the word "dunk", like this: mutate(type = ifelse(grepl("dunk", type), "dunk", type)).

Upvotes: 2

Related Questions