user
user

Reputation: 1381

Remove phone number from text

How can I remove phone numbers from a string if they are in different formats?

For example I have:

text='
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
    Smart Functionality: Yes - xx TV Streaming Platform
    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78'

also how can i remove those formats from text

 09414241441 095-41-41-441 (096)4141441 091-123-11-22 094 00 111 222

How can I remove these phone numbers?

(093) 123-34-56 (068) 123 45 67 (095) 123 456 78

I have tried gsub, but it removes all similar numbers.

Upvotes: 0

Views: 548

Answers (4)

Cary Swoveland
Cary Swoveland

Reputation: 110685

phone_formats = [/(\d{3}) \d{3}-\d{4}/,
                 /\d{3}-\d{3}-\d{4}/,
                 /\d{3} \d{3} \d{4}/,
                 /\(\d{3}\) \d{3} \d{3} \d{2}/,
                 /\(\d{3}\) \d{3} \d{2} \d{2}/,
                 /\(\d{3}\) \d{3}-\d{2}-\d{2}/,
                 /\d{3}-\d{3}-\d{2}-\d{2}/,
                 /\d{3}-\d{3}-\d{2}-\d{2}/]

r = Regexp.union(phone_formats)
  #=> /(?-mix:(\d{3}) \d{3}-\d{4})|
  #    (?-mix:\d{3}-\d{3}-\d{4})|
  #    (?-mix:\d{3} \d{3} \d{4})|
  #    (?-mix:\(\d{3}\) \d{3} \d{3} \d{2})|
  #    (?-mix:\(\d{3}\) \d{3} \d{2} \d{2})|
  #    (?-mix:\(\d{3}\) \d{3}-\d{2}-\d{2})|
  #    (?-mix:\d{3}-\d{3}-\d{2}-\d{2})|
  #    (?-mix:\d{3}-\d{3}-\d{2}-\d{2})/ 

(I have broken the Regexp.union's return value after each | for improved readability.)

text =<<_
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
Smart Functionality: Yes - xx TV Streaming Platform
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18,
TV with stand (inches) : 28.98x18.68x7.78
_

puts text.gsub(r,'')

Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
Smart Functionality: Yes - xx TV Streaming Platform
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18,
TV with stand (inches) : 28.98x18.68x7.78

Upvotes: 0

mohamed-ibrahim
mohamed-ibrahim

Reputation: 11137

You can use:

text.gsub(/\([0-9]*\)\s[0-9]*(-|\s)[0-9]*(-|\s)[0-9]*/, '')

this one will remove the phones in the format you specified in your text:

  • (XXX) XXX-XX-XX
  • (XXX) XXX XX XX

and always when you are trying to write regex try to use this Rubular

  • \([0-9]*\) need to capture numbers inside an parentheses(...), but as parentheses is special characters in regex so adding \ before it, [0-9] mean need a number and as its not only 1 number inside so adding * mean 0 or more number should be inside,

  • \s need a space after it,

  • (-|\s) need dash(-) (OR |) space(\s)

for other formats like:

  • XXXXXXXXXX
  • XXX-XX-XX-XXX
  • (XXX)XXXXXXX

beside above one, with the folliwng:

text.gsub(/\(*[0-9]+(\)|-)+\s*[0-9]+(-|\s)*[0-9]+(-|\s)*[0-9]+|[0-9]{10}/, '')

Upvotes: 3

the Tin Man
the Tin Man

Reputation: 160551

If your text is fixed format, that the numbers will always be the first line in the block, then simply remove the first line:

text='
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
    Smart Functionality: Yes - xx TV Streaming Platform
    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78'

text.strip
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n    Smart Functionality: Yes - xx TV Streaming Platform\n    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"
text.strip.lines
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", "    Smart Functionality: Yes - xx TV Streaming Platform\n", "    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]
text.strip.lines[1..-1].join
# => "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n    Smart Functionality: Yes - xx TV Streaming Platform\n    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"

Or:

lines = text.strip.lines
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", "    Smart Functionality: Yes - xx TV Streaming Platform\n", "    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]
lines.shift
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n"
lines.join
# => "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n    Smart Functionality: Yes - xx TV Streaming Platform\n    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"

Using a regex and gsub can work, but it's also more likely to become a maintenance problem.

If the phone numbers will always be on one line, but not necessarily the first, then I'd still use lines to break the text into an array, but I'd use reject with a regex to match the number pattern to check each line and reject the one with the phone-number-like regex match:

lines = text.lines
lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] }
# => ["\n", "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", "    Smart Functionality: Yes - xx TV Streaming Platform\n", "    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]

lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] }.join
# => "\n    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n    Smart Functionality: Yes - xx TV Streaming Platform\n    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"

Note that not using strip results in the leading "\n" being retained.

Using lines to transform the text to an array helps isolate any damage in case something else triggers the pattern match causing inadvertent damage to the text.

Where this approach breaks down is when the phone numbers are scattered throughout the text. I'd still probably use this approach to reduce the text to individual lines though, again to reduce the possible damage if there are false-positives.

Upvotes: 0

rock321987
rock321987

Reputation: 11032

As per your format, following regex works

/\(\d{3}\)\s+\d{3}[-\s]\d{2,3}[-\s]\d{2}/

Ruby Code

print text.gsub(/\(\d{3}\)\s+\d{3}[-\s]\d{2,3}[-\s]\d{2}/, "")

Ideone Demo

Upvotes: 1

Related Questions