carlite71
carlite71

Reputation: 413

Removing parentheses as unwanted text in R using gsub

I'm trying to clean up a column in my data frame where the rows look like this:

1234, text ()

and I need to keep just the number in all the rows. I used:

df$column = gsub(", text ()", "", df$column)

and got this:

1234()

I repeated the operation with only the parentheses, but they won't go away. I wasn't able to find an example that deals specifically with parentheses being eliminated as unwanted text. sub doesn't work either.

Anyone knows why this isn't working?

Upvotes: 1

Views: 1786

Answers (2)

Ryan Lutz
Ryan Lutz

Reputation: 525

If your column always looks like a format described above :

1234, text ()

Something like the following should work:

string extractedNumber = Regex.Match( INPUT_COLUMN, @"^\d{4,}").Value

Reads like: From the start of the string find four or more digits.

Upvotes: 0

David Arenburg
David Arenburg

Reputation: 92292

Parentheses are stored metacharacters in regex. You should escape them either using \\ or [] or adding fixed = TRUE. But in your case you just want to keep the number, so just remove everything else using \\D

gsub("\\D", "", "1234, text ()")
## [1] "1234"

Upvotes: 2

Related Questions