aanchal jain
aanchal jain

Reputation: 85

Regex to find Shorturl in a twitter text file

I need to find short url in the text post in java. I have the following regex expression "(http://(bit\.ly|t\.co|lnkd\.in|tcrn\.ch).*?)\s"

I have 2 questions

  1. The problem with the above expression is it doesn't match the short url if it is at the end of line. ex For text "blah http://linkd.in/R9Msf3 blah" gives "http://linkd.in/R9Msf3 "

    But blah blah http://linkd.in/R9Msf3 does not gives "http://linkd.in/R9Msf3"

    Any suggestions how to match both patterns ? Basically I just need to replace the short url out of the text.

  2. Also is there a better way to get all the short url format? If I hard code it then everytime I would have to add a new format to the config.

Upvotes: 2

Views: 2818

Answers (2)

Brigham
Brigham

Reputation: 14554

Instead of .* use \S* to avoid matching whitespace. You don't need the ? and you can use \b instead of \s to match the boundary between the end of the url and whitespace or end of string.

(http://(bit\.ly|t\.co|lnkd\.in|tcrn\.ch)\S*)\b

Upvotes: 2

gtgaxiola
gtgaxiola

Reputation: 9331

Try (\s|$) at the end of your REGEX

so http://(linkd\.in|t\.co|bitly\.co|tcrn\.ch).*?(\s|$)

Tested with RegexPal

enter image description here

Upvotes: 0

Related Questions