Reputation: 23883
I'm trying to remove all the elements that contain special characters or strings but some of the elements still there.
description_list = ['$', '2,850', 'door', '.', 'sale', '...', 'trades', '.', 'pay', 'pp', 'fees', 'shipping', 'cost', 'desirable', '\x932', 'liner', 'dial\x94', 'eta', 'movement', 'watch', '\x93safe', 'queen\x94', ',', 'pristine', 'condition', '.', 'i\x92m', 'original', 'owner', 'worn', 'watch', 'gently', 'handful', 'times', '.', 'protective', 'plastics', 'still', 'intact', 'case', 'back', ',', 'parts', 'clasp', 'full', 'original', 'kit', 'you\x92ll', 'see', 'pics', '.', 'includes', 'original', 'boxes', ',', 'manuals', ',', 'warranty', 'card', 'ad', ',', 'spare', 'bracelet', 'links', ',', 'dive', 'strap', '&', 'extension', ',', 'etc', 'payment', 'paypal', ',', 'due', 'quickly', 'upon', 'agreement', 'purchase', 'watch', '.', 'holds', ',', 'delays', ',', 'games', '.', 'pay', 'pp', 'fees', 'shipping', 'us', 'postal', 'service', 'priority', 'mail', 'w/signature', 'confirmation', ',', 'paypal', 'verified', 'address', 'inside', 'usa', '.', 'please', 'don\x92t', 'ask', 'ship', 'outside', 'usa', '.', 'exceptions', 'made', '.', 'please', 'e-mail', '[', 'email', 'protected', ']', '.', 'also', 'text', 'call', '210-705-3383.', 'name', 'james', 'crockett', 'thank', ',', 'james', 'crockett', '$', '2,850', 'door', '.', 'sale', '...', 'trades', '.', 'pay', 'pp', 'fees', 'shipping', 'cost', 'desirable', '\x932', 'liner', 'dial\x94', 'eta', 'movement', 'watch', '\x93safe', 'queen\x94', ',', 'pristine', 'condition', '.', 'i\x92m', 'original', 'owner', 'worn', 'watch', 'gently', 'handful', 'times', '.', 'protective', 'plastics', 'still', 'intact', 'case', 'back', ',', 'parts', 'clasp', 'full', 'original', 'kit', 'you\x92ll', 'see', 'pics', '.', 'includes', 'original', 'boxes', ',', 'manuals', ',', 'warranty', 'card', 'ad', ',', 'spare', 'bracelet', 'links', ',', 'dive', 'strap', '&', 'extension', ',', 'etc', 'payment', 'paypal', ',', 'due', 'quickly', 'upon', 'agreement', 'purchase', 'watch', '.', 'holds', ',', 'delays', ',', 'games', '.', 'pay', 'pp', 'fees', 'shipping', 'us', 'postal', 'service', 'priority', 'mail', 'w/signature', 'confirmation', ',', 'paypal', 'verified', 'address', 'inside', 'usa', '.', 'please', 'don\x92t', 'ask', 'ship', 'outside', 'usa', '.', 'exceptions', 'made', '.', 'please', 'e-mail', '[', 'email', 'protected', ']', '.', 'also', 'text', 'call', '210-705-3383.', 'name', 'james', 'crockett', 'thank', ',', 'james', 'crockett']
price_list = [x for x in description_list if any(c.isdigit() for c in x)]
Output
# price_list
['2,850', '\x932', '210-705-3383.', '2,850', '\x932', '210-705-3383.']
Should be like this (the comma is acceptable because want to extract price number)
['2,850', '2,850']
Upvotes: 0
Views: 77
Reputation: 2222
Regex answer
import re
price_list = [x for x in description_list if re.match('\d+(,*\d+)?$', x)]
Upvotes: 2
Reputation: 109536
You were close, assuming you want to retain data that contains digits or digits with commas. The current list comprehension for price_list
is returning strings if they contain at least one digit.
[str(x) for x in description_list if str(x).replace(',', '').isdigit()]
Upvotes: 1
Reputation: 26039
You can do an all
check inside list comprehension that checks if the string contains all digits or comma and then filter only comma values:
price_list = [x for x in description_list if all(c.isdigit() or c == ',' for c in x) and x != ',']
# ['2,850', '2,850']
Upvotes: 3