Reputation: 482
I want to iterate through this tuple and for each line, iterate through the words to find and replace some words (internet addresses, precisely) using regex while leaving them as lines.
aList=
[
"being broken changes people, \nand rn im missing the old me",
"@SaifAlmazroui @troyboy621 @petr_hruby you're all missing the point",
"#News #Detroit Detroit water customer receives shutoff threat over missing 10 cents: - Theresa Braxton is a l... T.CO/CHPBRVH9WKk",
"@_EdenRodwell \ud83d\ude29\ud83d\ude29ahh I love you!! Missing u, McDonald's car park goss soon please \u2764\ufe0f\u2764\ufe0fxxxxx",
"This was my ring tone, before I decided change was good and missing a call was insignificant T.CO?BUXLVZFDWQ",
"want to go on holiday again, missing the sun\ud83d\ude29\u2600\ufe0f"
]
This code below almost does that, but it breaks the list into words separated by lines:
i=0
while i<len(aList):
for line in aList[i].split():
line = re.sub(r"^[http](.*)\/(.*)$", "", line)
print (line)
i+=1
I'd love to have results as with the exception of the internet addresses in each line:
[
"being broken changes people, \nand rn im missing the old me",
"@SaifAlmazroui @troyboy621 @petr_hruby you're all missing the point",
"#News #Detroit Detroit water customer receives shutoff threat over missing 10 cents: - Theresa Braxton is a ",
"@_EdenRodwell \ud83d\ude29\ud83d\ude29ahh I love you!! Missing u, McDonald's car park goss soon please \u2764\ufe0f\u2764\ufe0fxxxxx",
"This was my ring tone, before I decided change was good and missing a call was insignificant",
"want to go on holiday again, missing the sun\ud83d\ude29\u2600\ufe0f"
]
Thanks
Upvotes: 1
Views: 1151
Reputation: 113924
From this:
re.sub(r"^[http](.*)\/(.*)$", "", line)
it looks to me as if you expect that all your URLs will be at the end of the line. In that case, try:
[re.sub('http://.*', '', s) for s in aList]
Here, http://
matches anything that starts with http://
. .*
matches everything that follows.
Here is your list with some URLs added:
aList = [
"being broken changes people, \nand rn im missing the old me",
"@SaifAlmazroui @troyboy621 @petr_hruby you're all missing the point",
"#News #Detroit Detroit water customer receives shutoff threat over missing 10 cents: - Theresa Braxton is a http://example.com/CHPBRVH9WKk",
"@_EdenRodwell ahh I love you!! Missing u, McDonald's car park goss soon please xxxxx",
"This was my ring tone, before I decided change was good and missing a call was insignificant http://example.com?BUXLVZFDWQ",
"want to go on holiday again, missing the sun"
]
Here is the result:
>>> [re.sub('http://.*', '', s) for s in aList]
['being broken changes people, \nand rn im missing the old me',
"@SaifAlmazroui @troyboy621 @petr_hruby you're all missing the point",
'#News #Detroit Detroit water customer receives shutoff threat over missing 10 cents: - Theresa Braxton is a ',
"@_EdenRodwell ahh I love you!! Missing u, McDonald's car park goss soon please xxxxx",
'This was my ring tone, before I decided change was good and missing a call was insignificant ',
'want to go on holiday again, missing the sun']
Upvotes: 2
Reputation: 100
Your question is a little unclear, but I think I get what you're going for
newlist = [re.sub(r"{regex}", "", line) for line in alist]
Should iterate through a list of strings and replace any strings that match your regex pattern with an empty string using a python list comprehension
side note:
Looking closer at your regex it looks like its not doing what you think its doing I would look at this stack over flow post about matching urls in regex
Regex to find urls in string in Python
Upvotes: 2