Reputation: 1381
How can I remove phone numbers from a string if they are in different formats?
For example I have:
text='
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
Smart Functionality: Yes - xx TV Streaming Platform
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78'
also how can i remove those formats from text
09414241441 095-41-41-441 (096)4141441 091-123-11-22 094 00 111 222
How can I remove these phone numbers?
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
I have tried gsub
, but it removes all similar numbers.
Upvotes: 0
Views: 548
Reputation: 110685
phone_formats = [/(\d{3}) \d{3}-\d{4}/,
/\d{3}-\d{3}-\d{4}/,
/\d{3} \d{3} \d{4}/,
/\(\d{3}\) \d{3} \d{3} \d{2}/,
/\(\d{3}\) \d{3} \d{2} \d{2}/,
/\(\d{3}\) \d{3}-\d{2}-\d{2}/,
/\d{3}-\d{3}-\d{2}-\d{2}/,
/\d{3}-\d{3}-\d{2}-\d{2}/]
r = Regexp.union(phone_formats)
#=> /(?-mix:(\d{3}) \d{3}-\d{4})|
# (?-mix:\d{3}-\d{3}-\d{4})|
# (?-mix:\d{3} \d{3} \d{4})|
# (?-mix:\(\d{3}\) \d{3} \d{3} \d{2})|
# (?-mix:\(\d{3}\) \d{3} \d{2} \d{2})|
# (?-mix:\(\d{3}\) \d{3}-\d{2}-\d{2})|
# (?-mix:\d{3}-\d{3}-\d{2}-\d{2})|
# (?-mix:\d{3}-\d{3}-\d{2}-\d{2})/
(I have broken the Regexp.union
's return value after each |
for improved readability.)
text =<<_
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
Smart Functionality: Yes - xx TV Streaming Platform
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18,
TV with stand (inches) : 28.98x18.68x7.78
_
puts text.gsub(r,'')
Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
Smart Functionality: Yes - xx TV Streaming Platform
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18,
TV with stand (inches) : 28.98x18.68x7.78
Upvotes: 0
Reputation: 11137
You can use:
text.gsub(/\([0-9]*\)\s[0-9]*(-|\s)[0-9]*(-|\s)[0-9]*/, '')
this one will remove the phones in the format you specified in your text:
and always when you are trying to write regex try to use this Rubular
\([0-9]*\)
need to capture numbers inside an parentheses(...)
, but as parentheses is special characters in regex so adding \
before it, [0-9]
mean need a number and as its not only 1 number inside so adding *
mean 0 or more number should be inside,
\s
need a space after it,
(-|\s)
need dash(-
) (OR |
) space(\s
)for other formats like:
beside above one, with the folliwng:
text.gsub(/\(*[0-9]+(\)|-)+\s*[0-9]+(-|\s)*[0-9]+(-|\s)*[0-9]+|[0-9]{10}/, '')
Upvotes: 3
Reputation: 160551
If your text is fixed format, that the numbers will always be the first line in the block, then simply remove the first line:
text='
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
Smart Functionality: Yes - xx TV Streaming Platform
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78'
text.strip
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"
text.strip.lines
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", " Smart Functionality: Yes - xx TV Streaming Platform\n", " Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]
text.strip.lines[1..-1].join
# => " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"
Or:
lines = text.strip.lines
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", " Smart Functionality: Yes - xx TV Streaming Platform\n", " Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]
lines.shift
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n"
lines.join
# => " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"
Using a regex and gsub
can work, but it's also more likely to become a maintenance problem.
If the phone numbers will always be on one line, but not necessarily the first, then I'd still use lines
to break the text into an array, but I'd use reject
with a regex to match the number pattern to check each line and reject the one with the phone-number-like regex match:
lines = text.lines
lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] }
# => ["\n", " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", " Smart Functionality: Yes - xx TV Streaming Platform\n", " Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]
lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] }.join
# => "\n Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"
Note that not using strip
results in the leading "\n" being retained.
Using lines
to transform the text to an array helps isolate any damage in case something else triggers the pattern match causing inadvertent damage to the text.
Where this approach breaks down is when the phone numbers are scattered throughout the text. I'd still probably use this approach to reduce the text to individual lines though, again to reduce the possible damage if there are false-positives.
Upvotes: 0
Reputation: 11032
As per your format, following regex works
/\(\d{3}\)\s+\d{3}[-\s]\d{2,3}[-\s]\d{2}/
Ruby Code
print text.gsub(/\(\d{3}\)\s+\d{3}[-\s]\d{2,3}[-\s]\d{2}/, "")
Upvotes: 1