How to remove words containing a substring in a python string?

Question

As I'm consuming Twitter API, I got several strings (tweets) containing links, that's it substrings beggining with 'http://'.

How can I get rid of such links, that's it, I want to remove the whole word.

Let's say I have:

'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre http://t.co/Ad2oWDNd4u'

And I want to obtain:

'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre'

Such substrings may appear anywhere along the string

alecxe · Accepted Answer

You can use re.sub() to replace all links with an empty string:

>>> import re
>>> pattern = re.compile('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*,]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')
>>> s = 'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre http://t.co/Ad2oWDNd4u'
>>> pattern.sub('', s)
'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre '

It replaces all the links in the string anywhere inside it:

>>> s = "I've used google https://google.com and found a regular expression pattern to find links here https://stackoverflow.com/questions/6883049/regex-to-find-urls-in-string-in-python"
>>> pattern.sub('', s)
"I've used google  and found a regular expression pattern to find links here "

Regular expression was taken from this thread:

Regex to find urls in string in Python

How to remove words containing a substring in a python string?

Answers (2)

Related Questions