terry
terry

Reputation: 143

Read a text file and remove all characters except alphabets & spaces in Python

I am trying to remove all characters except alphabets along with the spaces.
This is what my code looks like.
Where sampletext.txt contains words with multiple characters, I am writing the result in removed.txt. When I run this code. I am getting only blanks in removed.txt

import re
import sys
filename = open("removed.txt",'w')
sys.stdout = filename
from string import ascii_letters
allowed = set(ascii_letters + ' ')
with open("/Desktop/stem_analysis/sampletext.txt", 'r') as f:
    answer = ''.join(l for l in f if l in allowed)
print(answer)


Whats the problem with my code

Upvotes: 1

Views: 2276

Answers (3)

Pedro Lobito
Pedro Lobito

Reputation: 98961

I am trying to remove all characters except alphabets along with the spaces.

I'm not 100% sure of what you're trying to do, but to remove all characters except alphabets along with the spaces, you can use something like:

with open("old_file.txt", "r") as f, open("new_file.txt", "w") as n:
    x = f.read()
    result = re.sub("[^a-z\s]", "", x, 0, re.IGNORECASE | re.MULTILINE)
    n.write(result)

Regex Explanation:

enter image description here


Regex Demo

Upvotes: 1

Blake
Blake

Reputation: 1327

Something like this

import re re.sub(r'^[a-zA-Z]', '', your_string)

should do what you’re asking except for the spaces part. I’m sure you can figure out how to add that in to the regex as well.

Upvotes: 0

easyProgrammer
easyProgrammer

Reputation: 53

This will give you all the characters that are not in the alphabet. Add another if statement to check for spaces.

def letters(input):
  return ''.join([c for c in input if (c.isalpha()==False)])

Upvotes: 0

Related Questions