MostlyRquestions
MostlyRquestions

Reputation: 566

using regular expressions with R

I have an array of characters in R. Some of the strings have a '(number)' pattern appended to that string. I'm trying to remove this '(number)' string from using regular expressions but cannot figure it out. I can access the rows of all the rows where the string has a whitespace than a character but there must be a way to find these number strings.

  dat <- c("Alabama-Birmingham", "Arizona State", "Canisius", "UCF", "George Washington", 
             "Green Bay", "Iona", "Louisville (7)", "UMass", "Memphis", "Michigan State", 
             "Milwaukee", "Nebraska", "Niagara", "Northern Kentucky", "Notre Dame (21)", 
             "Quinnipiac", "Siena", "Tulsa", "Washington State", "Wright State", 
             "Xavier")

    rows <- grep(" (.*)", dat)
    fixed <- gsub(" (.*)","",games[rows,])
    dat = fixed

Upvotes: 2

Views: 620

Answers (2)

akrun
akrun

Reputation: 886948

We can do this with sub

sub("\\s*\\(.*", "", dat)
#[1] "Alabama-Birmingham" "Arizona State"      "Canisius"          
#[4] "UCF"                "George Washington"  "Green Bay"         
#[7] "Iona"               "Louisville"         "UMass"             
#[10] "Memphis"            "Michigan State"     "Milwaukee"         
#[13] "Nebraska"           "Niagara"            "Northern Kentucky" 
#[16] "Notre Dame"         "Quinnipiac"         "Siena"             
#[19] "Tulsa"              "Washington State"   "Wright State"      
#[22] "Xavier"            

Upvotes: 1

G5W
G5W

Reputation: 37641

First, you need to escape the parentheses and it would be good to be more specific about what is inside them

gsub("\\s+\\(\\d+\\)", "", dat)
 [1] "Alabama-Birmingham" "Arizona State"      "Canisius"          
 [4] "UCF"                "George Washington"  "Green Bay"         
 [7] "Iona"               "Louisville"         "UMass"             
[10] "Memphis"            "Michigan State"     "Milwaukee"         
[13] "Nebraska"           "Niagara"            "Northern Kentucky" 
[16] "Notre Dame"         "Quinnipiac"         "Siena"             
[19] "Tulsa"              "Washington State"   "Wright State"      
[22] "Xavier" 

Upvotes: 3

Related Questions