Reputation: 323
I scraped tweet statuses, from which I'm removing certain words; however, it doesn't work effectively as it only removes the first string in "stopwords".
Code:
stopwords = ['/people', '/photo/1']
link_list = []
for link in links:
for i in stopwords:
remove = link.replace(i, "")
link = remove
link_list.append(link)
Output:
https://twitter.com/CultOfCurtis/status/1492292326051483648
https://twitter.com/ZBumblenuts/status/1492292306149560321
https://twitter.com/AndreWillemse4/status/1492292279129804806
https://twitter.com/JaimeeJakobczak/status/1492292268354584578
https://twitter.com/consequence/status/1492245783084773383/photo/1
https://twitter.com/consequence/status/1492245783084773383
https://twitter.com/EVStyle2/status/1492292266169298944
https://twitter.com/SammyMorgan/status/1492292246766436355
https://twitter.com/gayesian/status/1492292246456184841
https://twitter.com/khendriix_/status/1492292245734707202
https://twitter.com/Mauro_Sosa_S/status/1492292242320539650
I tried different codes after researching, but to no avail. :/
Upvotes: 0
Views: 72
Reputation: 3357
You just need to de-indent the last line there:
stopwords = ['/people', '/photo/1']
link_list = []
for link in links:
for i in stopwords:
remove = link.replace(i, "")
link = remove
link_list.append(link)
In its original position, it would append the link with /people
removed but before removing /photo/1
. Then it would append again with /photo/1
removed.
You could alternatively apply this suggestion here and use a compiled regular expression:
import re
stopwords = ['/people', '/photo/1']
pattern = re.compile('|'.join(map(re.escape, stopwords)))
link_list = [pattern.sub('', link) for link in links]
Upvotes: 4