Reputation: 137
I have list of strings and I have to remove all special characters (, - ' " .).
My code is
import glob
import re
files = []
for text in glob.glob("*.txt.txt"):
with open(text) as f:
fileRead = [ line.lower() for line in f]
files.append(fileRead)
files1 = []
for item in files :
files1.append(''.join(item))
I have used lot of options including "replace", "strip" and "re".
when I use strip (shown below), the code runs but no changes are seen in output.
files1 = [line.strip("'") for line in files1]
When I use re, I get TypeError: expected string or bytes-like object. I changed to list of strings from list of lists so that I can use re. This method is stated many times but did not solve the problem for me.
files1 = re.sub(r"[-()\"#/@;:<>{}`+=~|.!?,]", "", files1)
I am not able to use replace as it throws an attribute error that replace cannot be used on lists.
Please suggest me how can I get rid of all special characters.
Upvotes: 0
Views: 19654
Reputation: 2117
try below example:
files = ["Hello%","&*hhf","ddh","GTD@JJ"] #input data in list
# going through each element of list
# apllying a filter on each character of string for alphabet or numeric other then special symbol
# joining the charactors back again and putting them in list
result = ["".join(list(filter(str.isalnum, line))) for line in files]
print(result) #print the result
Output:
['Hello', 'hhf', 'ddh', 'GTDJJ']
Upvotes: 0
Reputation: 2686
You can use str.isalnum
will return True if all the character in the str are Alpha numeric.
Upvotes: 0
Reputation: 96
You should apply the re.sub function on single objects, not on lists.
files_cleaned = [re.sub(r"[-()\"#/@;:<>{}`+=~|.!?,]", "", file) for file in files]
If you only want to accept alphanumerical characters you can do this instead:
files_cleaned = [re.sub(r"[^a-zA-Z0-9]", "", file) for file in files]
Upvotes: 4