Reputation: 141
i'm practicing my regex with r on a football schedule and can't figure this out
I'm essentially trying to change any home game to the string HOME. here is a snippet of the schedule_team dataframe that I am using:
Team w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14
1 ARI SD @NYG SF BYE @DEN WSH @OAK PHI @DAL STL DET @SEA @ATL KC
2 ATL NO @CIN TB @MIN @NYG CHI @BAL DET BYE @TB @CAR CLE ARI @GB
3 BAL CIN PIT @CLE CAR @IND @TB ATL @CIN @PIT TEN BYE @NO SD @MIA
non home teams have a @ symbol to begin the string. home teams do not. using regex in python I believe all home teams can be selected with regex like: ^([A-Z])\w+ .. essentially saying begins with a capital. this doesn't work in R because of the \w among other errors.
Here is what I tried (and failed):
str_replace_all(as.matrix(schedule_teams), "[[^([A-Z])\w+]]", "HOME")
is there an easier way to change all home teams to HOME?
thanks in advance
Upvotes: 4
Views: 208
Reputation: 70732
Your regular expression syntax is incorrect, you have it wrapped inside of cascading character classes and you are trying to use a capturing group inside of the class which causes the pattern to fail when it reaches the closing )
To be concise, your regular expression currently defines a set of characters (not what you want) then fails.
[[^([A-Z] # any character of: '[', '^', '(', '[', 'A' to 'Z'
To fix this issue you need to remove the character classes and the capturing group that you have placed inside, making sure you double escape \w
in your regular expression pattern and then it should work for you.
I tested this on my console and it worked fine.
> df[,-1] <- str_replace_all(as.matrix(df[,-1]), '^[A-Z]\\w+', 'HOME')
## Team w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14
## 1 ARI HOME @NYG HOME HOME @DEN HOME @OAK HOME @DAL HOME HOME @SEA @ATL HOME
## 2 ATL HOME @CIN HOME @MIN @NYG HOME @BAL HOME HOME @TB @CAR HOME HOME @GB
## 3 BAL HOME HOME @CLE HOME @IND @TB HOME @CIN @PIT HOME HOME @NO HOME @MIA
Aside from using the stringr
library, you can do this using sub
if you insist using a regular expression.
> df[,-1] <- sub('^[A-Z]\\w+', 'HOME', as.matrix(df[,-1]))
And here is an approach without using regular expression:
> m <- as.matrix(df[-1])
> m[substr(m,0,1) != '@'] <- 'HOME'
> cbind(df[1], m)
## Team w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14
## 1 ARI HOME @NYG HOME HOME @DEN HOME @OAK HOME @DAL HOME HOME @SEA @ATL HOME
## 2 ATL HOME @CIN HOME @MIN @NYG HOME @BAL HOME HOME @TB @CAR HOME HOME @GB
## 3 BAL HOME HOME @CLE HOME @IND @TB HOME @CIN @PIT HOME HOME @NO HOME @MIA
Upvotes: 5