TheSoldier
TheSoldier

Reputation: 482

Iterate and replace words in lines of a tuple python

I want to iterate through this tuple and for each line, iterate through the words to find and replace some words (internet addresses, precisely) using regex while leaving them as lines.

aList=
[
  "being broken changes people, \nand rn im missing the old me", 
  "@SaifAlmazroui @troyboy621 @petr_hruby you're all missing the point", 
  "#News #Detroit Detroit water customer receives shutoff threat over missing 10 cents: - Theresa Braxton is a l... T.CO/CHPBRVH9WKk", 
  "@_EdenRodwell \ud83d\ude29\ud83d\ude29ahh I love you!! Missing u, McDonald's car park goss soon please \u2764\ufe0f\u2764\ufe0fxxxxx", 
  "This was my ring tone, before I decided change was good and missing a call was insignificant T.CO?BUXLVZFDWQ", 
  "want to go on holiday again, missing the sun\ud83d\ude29\u2600\ufe0f"
]

This code below almost does that, but it breaks the list into words separated by lines:

i=0
while i<len(aList):
    for line in aList[i].split():
        line = re.sub(r"^[http](.*)\/(.*)$", "", line)
        print (line)
        i+=1

I'd love to have results as with the exception of the internet addresses in each line:

[
  "being broken changes people, \nand rn im missing the old me", 
  "@SaifAlmazroui @troyboy621 @petr_hruby you're all missing the point", 
  "#News #Detroit Detroit water customer receives shutoff threat over missing 10 cents: - Theresa Braxton is a ", 
  "@_EdenRodwell \ud83d\ude29\ud83d\ude29ahh I love you!! Missing u, McDonald's car park goss soon please \u2764\ufe0f\u2764\ufe0fxxxxx", 
  "This was my ring tone, before I decided change was good and missing a call was insignificant", 
  "want to go on holiday again, missing the sun\ud83d\ude29\u2600\ufe0f"
]

Thanks

Upvotes: 1

Views: 1151

Answers (2)

John1024
John1024

Reputation: 113924

From this:

re.sub(r"^[http](.*)\/(.*)$", "", line)

it looks to me as if you expect that all your URLs will be at the end of the line. In that case, try:

[re.sub('http://.*', '', s) for s in aList]

Here, http:// matches anything that starts with http://. .* matches everything that follows.

Example

Here is your list with some URLs added:

aList = [
  "being broken changes people, \nand rn im missing the old me",
  "@SaifAlmazroui @troyboy621 @petr_hruby you're all missing the point",
  "#News #Detroit Detroit water customer receives shutoff threat over missing 10 cents: - Theresa Braxton is a http://example.com/CHPBRVH9WKk",
  "@_EdenRodwell ahh I love you!! Missing u, McDonald's car park goss soon please xxxxx",
  "This was my ring tone, before I decided change was good and missing a call was insignificant http://example.com?BUXLVZFDWQ",
  "want to go on holiday again, missing the sun"
  ]

Here is the result:

>>> [re.sub('http://.*', '', s) for s in aList]
['being broken changes people, \nand rn im missing the old me',
 "@SaifAlmazroui @troyboy621 @petr_hruby you're all missing the point",
 '#News #Detroit Detroit water customer receives shutoff threat over missing 10 cents: - Theresa Braxton is a ',
 "@_EdenRodwell ahh I love you!! Missing u, McDonald's car park goss soon please xxxxx",
 'This was my ring tone, before I decided change was good and missing a call was insignificant ',
 'want to go on holiday again, missing the sun']

Upvotes: 2

Grant Powell
Grant Powell

Reputation: 100

Your question is a little unclear, but I think I get what you're going for

newlist = [re.sub(r"{regex}", "", line) for line in alist]

Should iterate through a list of strings and replace any strings that match your regex pattern with an empty string using a python list comprehension

side note:

Looking closer at your regex it looks like its not doing what you think its doing I would look at this stack over flow post about matching urls in regex

Regex to find urls in string in Python

Upvotes: 2

Related Questions