Reputation: 566
I have an array of characters in R. Some of the strings have a '(number)' pattern appended to that string. I'm trying to remove this '(number)' string from using regular expressions but cannot figure it out. I can access the rows of all the rows where the string has a whitespace than a character but there must be a way to find these number strings.
dat <- c("Alabama-Birmingham", "Arizona State", "Canisius", "UCF", "George Washington",
"Green Bay", "Iona", "Louisville (7)", "UMass", "Memphis", "Michigan State",
"Milwaukee", "Nebraska", "Niagara", "Northern Kentucky", "Notre Dame (21)",
"Quinnipiac", "Siena", "Tulsa", "Washington State", "Wright State",
"Xavier")
rows <- grep(" (.*)", dat)
fixed <- gsub(" (.*)","",games[rows,])
dat = fixed
Upvotes: 2
Views: 620
Reputation: 886948
We can do this with sub
sub("\\s*\\(.*", "", dat)
#[1] "Alabama-Birmingham" "Arizona State" "Canisius"
#[4] "UCF" "George Washington" "Green Bay"
#[7] "Iona" "Louisville" "UMass"
#[10] "Memphis" "Michigan State" "Milwaukee"
#[13] "Nebraska" "Niagara" "Northern Kentucky"
#[16] "Notre Dame" "Quinnipiac" "Siena"
#[19] "Tulsa" "Washington State" "Wright State"
#[22] "Xavier"
Upvotes: 1
Reputation: 37641
First, you need to escape the parentheses and it would be good to be more specific about what is inside them
gsub("\\s+\\(\\d+\\)", "", dat)
[1] "Alabama-Birmingham" "Arizona State" "Canisius"
[4] "UCF" "George Washington" "Green Bay"
[7] "Iona" "Louisville" "UMass"
[10] "Memphis" "Michigan State" "Milwaukee"
[13] "Nebraska" "Niagara" "Northern Kentucky"
[16] "Notre Dame" "Quinnipiac" "Siena"
[19] "Tulsa" "Washington State" "Wright State"
[22] "Xavier"
Upvotes: 3