user14738548
user14738548

Reputation: 327

Do not want to remove "_" using isalnum() in Python

I need to create an exception for isalnum() to not remove the "_" after the word NOT

This is the data structure I have

 data = ['TEXT TEXT. TEXT! NOT_TEXT', 'TEXT? NOT_TEXT, TEXT NOT_TEXT']

The code I have:

for text in range(len(data)):
    data_new = ["".join([char for char in text if char.isalnum() or char == " "]) for text in data]

Outcome I have:

['TEXT TEXT TEXT NOTTEXT', 'TEXT NOTTEXT TEXT NOTTEXT']

However, I need to create an exception for isalnum() to not remove the "_" after the word NOT

expected outcome:

['TEXT TEXT TEXT NOT_TEXT', 'TEXT NOT_TEXT TEXT NOT_TEXT']

Upvotes: 0

Views: 239

Answers (3)

inof
inof

Reputation: 495

Your code already has an exception for space characters, so why don’t you simply augment that to also include the underscore character? Also, your outer for loop is redundant, and your formatting could be improved, i.e. it’s better to not write the whole code on one line.

The following snippet is still not perfect, but it does what you want. I’m sure you can improve on it further.

data = ['TEXT TEXT. TEXT! NOT_TEXT', 'TEXT? NOT_TEXT, TEXT NOT_TEXT']

data_new = [
    "".join(
        char for char in text if char.isalnum() or char in (" _")
    ) for text in data
]

print (data_new)

Upvotes: 1

ptts
ptts

Reputation: 2086

If you're just trying to get the desired output, you can use regex.

import re

data = ["TEXT TEXT. TEXT! NOT_TEXT", "TEXT? NOT_TEXT, TEXT NOT_TEXT"]

for item in data:
    data_new = re.findall(r"\w+", item)
    print(data_new)

Output

['TEXT', 'TEXT', 'TEXT', 'NOT_TEXT']
['TEXT', 'NOT_TEXT', 'TEXT', 'NOT_TEXT']

Or, with a list comprehension:

new_data = [re.findall(r"\w+", item) for item in data]

Upvotes: 0

Mad Physicist
Mad Physicist

Reputation: 114440

isalnum is part of the decision of which characters to keep. So is == ' '. You can add one more item to the condition:

char.isalnum() or char in " _"

Upvotes: 2

Related Questions