Reputation: 143
I am trying to remove all characters except alphabets along with the spaces.
This is what my code looks like.
Where sampletext.txt contains words with multiple characters, I am writing the result in removed.txt.
When I run this code. I am getting only blanks in removed.txt
import re
import sys
filename = open("removed.txt",'w')
sys.stdout = filename
from string import ascii_letters
allowed = set(ascii_letters + ' ')
with open("/Desktop/stem_analysis/sampletext.txt", 'r') as f:
answer = ''.join(l for l in f if l in allowed)
print(answer)
Whats the problem with my code
Upvotes: 1
Views: 2276
Reputation: 98961
I am trying to remove all characters except alphabets along with the spaces.
I'm not 100% sure of what you're trying to do, but to remove all characters except alphabets along with the spaces, you can use something like:
with open("old_file.txt", "r") as f, open("new_file.txt", "w") as n:
x = f.read()
result = re.sub("[^a-z\s]", "", x, 0, re.IGNORECASE | re.MULTILINE)
n.write(result)
Regex Explanation:
Upvotes: 1
Reputation: 1327
Something like this
import re re.sub(r'^[a-zA-Z]', '', your_string)
should do what you’re asking except for the spaces part. I’m sure you can figure out how to add that in to the regex as well.
Upvotes: 0
Reputation: 53
This will give you all the characters that are not in the alphabet. Add another if statement to check for spaces.
def letters(input):
return ''.join([c for c in input if (c.isalpha()==False)])
Upvotes: 0