Reputation: 413
I'm trying to clean up a column in my data frame where the rows look like this:
1234, text ()
and I need to keep just the number in all the rows. I used:
df$column = gsub(", text ()", "", df$column)
and got this:
1234()
I repeated the operation with only the parentheses, but they won't go away. I wasn't able to find an example that deals specifically with parentheses being eliminated as unwanted text. sub
doesn't work either.
Anyone knows why this isn't working?
Upvotes: 1
Views: 1786
Reputation: 525
If your column always looks like a format described above :
1234, text ()
Something like the following should work:
string extractedNumber = Regex.Match( INPUT_COLUMN, @"^\d{4,}").Value
Reads like: From the start of the string find four or more digits.
Upvotes: 0
Reputation: 92292
Parentheses are stored metacharacters in regex. You should escape them either using \\
or []
or adding fixed = TRUE
. But in your case you just want to keep the number, so just remove everything else using \\D
gsub("\\D", "", "1234, text ()")
## [1] "1234"
Upvotes: 2