How remove multiple characters from multiple txt files

Question

I'm trying to do a script to automate a simple task of removing characters from txt files and I want to save it with the same name but without the chars. I have multiple txt files: e.g 1.txt, 2.txt ... 200.txt, stored in a directory (Documents). I have a txt file with the characters I want to remove. At the beginning I though to compare my chars_to_remove.txt to all my different files (1.txt, 2.txt...) but I could find a way to do so. Instead, I created a string with all those chars I want to remove.

Let's say I have the following string in 1.txt file:

Mean concentrations α, maximum value ratio β and reductions in NO2 due to the lockdown Δ, March 2020, 2019 and 2018 in Madrid and Barcelona (Spain).

I want to remove α, β, and Δ chars from the string. This is my code as far.

import glob 
import os 

chars_to_remove = '‘’“”|n.d.…•∈αβδΔεθϑφΣμτσχ€$∞http:www.←→≥≤<>▷×°±*⁃'

file_location = os.path.join('Desktop', 'Documents', '*.txt')
file_names = glob.glob(file_location)
print(file_names)

for f in file_names:
    outfile = open(f,'r',encoding='latin-1')
    data = outfile.read()
    if chars_to_remove in data:
        data.replace(chars_to_remove, '')
    outfile.close()

The variable data stores in each iteration all the content from the txt files. I want to check if there are chars_to_remove in the string and remove it with replace() function. I tried different approaches suggested here and here without sucess.

Also, I tried to compare it as a list:

chars_to_remove = ['‘','’','“','”','|','n.d.','…','•','∈','α','β','δ','Δ','ε','θ','ϑ','φ','Σ','μ','τ','σ','χ','€','$','∞','http:','www.','←','→','≥','≤','<','>','▷','×','°','±','*','⁃']

but got datatype errors when comparing.

Any further idea will be appreciated!

Keivan Ipchi Hagh · Accepted Answer

It may not be as fast, but why not use Regex to remove the characters/phrases?

import re

pattern = re.compile(r"(‘|’|“|”|\||n.d.|…|•|∈|α|β|δ|Δ|ε|θ|ϑ|φ|Σ|μ|τ|σ|χ|€|$|∞|http:|www.|←|→|≥|≤|<|>|▷|×|°|±|\*|⁃)")
result = pattern.sub("", 'Mean concentrations α, maximum value ratio β and reductions in NO2 due to the lockdown Δ, March 2020, 2019 and 2018 in Madrid and Barcelona (Spain).')
print(result)

Output

Mean concentrations , maximum value ratio  and reductions in NO2 due to the lockdown , March 2020, 2019 and 2018 in Madrid and Barcelona (Spain).

How remove multiple characters from multiple txt files

Answers (2)

Output

Hit

Related Questions