Reputation: 327
I need to create an exception for isalnum()
to not remove the "_" after the word NOT
This is the data structure I have
data = ['TEXT TEXT. TEXT! NOT_TEXT', 'TEXT? NOT_TEXT, TEXT NOT_TEXT']
The code I have:
for text in range(len(data)):
data_new = ["".join([char for char in text if char.isalnum() or char == " "]) for text in data]
Outcome I have:
['TEXT TEXT TEXT NOTTEXT', 'TEXT NOTTEXT TEXT NOTTEXT']
However, I need to create an exception for isalnum()
to not remove the "_" after the word NOT
expected outcome:
['TEXT TEXT TEXT NOT_TEXT', 'TEXT NOT_TEXT TEXT NOT_TEXT']
Upvotes: 0
Views: 239
Reputation: 495
Your code already has an exception for space characters, so why don’t you simply augment that to also include the underscore character? Also, your outer for
loop is redundant, and your formatting could be improved, i.e. it’s better to not write the whole code on one line.
The following snippet is still not perfect, but it does what you want. I’m sure you can improve on it further.
data = ['TEXT TEXT. TEXT! NOT_TEXT', 'TEXT? NOT_TEXT, TEXT NOT_TEXT']
data_new = [
"".join(
char for char in text if char.isalnum() or char in (" _")
) for text in data
]
print (data_new)
Upvotes: 1
Reputation: 2086
If you're just trying to get the desired output, you can use regex.
import re
data = ["TEXT TEXT. TEXT! NOT_TEXT", "TEXT? NOT_TEXT, TEXT NOT_TEXT"]
for item in data:
data_new = re.findall(r"\w+", item)
print(data_new)
Output
['TEXT', 'TEXT', 'TEXT', 'NOT_TEXT']
['TEXT', 'NOT_TEXT', 'TEXT', 'NOT_TEXT']
Or, with a list comprehension:
new_data = [re.findall(r"\w+", item) for item in data]
Upvotes: 0
Reputation: 114440
isalnum
is part of the decision of which characters to keep. So is == ' '
. You can add one more item to the condition:
char.isalnum() or char in " _"
Upvotes: 2