Reputation: 8376
As I'm consuming Twitter API, I got several strings (tweets) containing links, that's it substrings beggining with 'http://'
.
How can I get rid of such links, that's it, I want to remove the whole word.
Let's say I have:
'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre http://t.co/Ad2oWDNd4u'
And I want to obtain:
'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre'
Such substrings may appear anywhere along the string
Upvotes: 0
Views: 8209
Reputation: 473763
You can use re.sub() to replace all links with an empty string:
>>> import re
>>> pattern = re.compile('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')
>>> s = 'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre http://t.co/Ad2oWDNd4u'
>>> pattern.sub('', s)
'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre '
It replaces all the links in the string anywhere inside it:
>>> s = "I've used google https://google.com and found a regular expression pattern to find links here https://stackoverflow.com/questions/6883049/regex-to-find-urls-in-string-in-python"
>>> pattern.sub('', s)
"I've used google and found a regular expression pattern to find links here "
Regular expression was taken from this thread:
Upvotes: 4
Reputation: 32189
You can just do it as:
s[:s.index('http://')-1]
If it doesn't always appear at the end, you can do:
your_list = s.split()
i = 0
while i < len(your_list):
if your_list[i].startswith('http://'):
del your_list[i]
else:
i+=1
s = ' '.join(your_list)
Upvotes: 0