wanjo
wanjo

Reputation: 15

Storing a text-file in a list, then stripping white spaces, commas and apostrophes in the list

I am trying to strip all the white-spaces, commas, and apostrophes in my list, which came from a text file the user will input. I am trying to filter it to just showing numbers (with no spaces in between).

I am attempting to remove the white-spaces, commas, square brackets, and apostrophes in the variable 'file_strip', but it seems to output the same as 'file_stored_in_list'.

Anyone help me come up with a solution to filter the text file to just numbers? If there are more efficient ways of reading the text file, please let me know! Thanks!

filename = input("Input the name of the file: ")
file = open(filename, "r")

#Stores the text file into a list
file_stored_in_list = file.read().splitlines()    
file.close()

#from .txt file: Outputs ['2        7        6', '9        5        1', '4        3        8']
print(file_stored_in_list)


#Attempted to remove white-spaces, tried with commas, sqaure 
brackets and apostrophes, left blank for now
file_strip = [i.strip(" ") for i in file_stored_in_list]

#Outputs the same ['2        7        6', '9        5        1', '4        3        8']
print(file_strip)

Upvotes: 0

Views: 367

Answers (3)

Pobe
Pobe

Reputation: 2793

A regex sub should do the trick.

import re
mylines = []
with open(myfile) as f: #better, more pythonic
    mylines = f.readlines()

clean_lines = []
clean_lines = [re.sub(r"\s+", " ", l) for l in mylines]

This worked for me when i tried:

>>> import re
>>> re.sub(r"\s+", " ", "a      b      c")
'a b c'

Upvotes: 0

Serge Ballesta
Serge Ballesta

Reputation: 149185

Oops... you are trying to remove characters that do not exist in the file!

I would bet a coin that the content of the file is just:

2        7        6
9        5        1
4        3        8

But you read it with:

file = open(filename, "r")

#Stores the text file into a list
file_stored_in_list = file.read().splitlines()    
file.close()

From there on, file_stored_in_list is a nice list of nice strings. To make sure of it, just print it line by line:

for line in file_stored_in_list:
    print(line)

But when you print a list, python prints square brackets ([]) around the list, and prints the representation of the elements. And the representation of a string is that string enclosed in quotes...

BTW, the correct way of reading a file line by line is:

with open(filename) as file:
    for line in file:
        # process the line...

Upvotes: 2

Nullman
Nullman

Reputation: 4279

one way of approaching this is with a translation:

translation = str.maketrans("", "", " \t,[]'")
file_strip = [item.translate(translation) for item in file_stored_in_list]

another way is to use regular expressions:

import re
reg = re.compile(r'\D') # \D is anything other than digits
file_strip = [re.sub(reg, '', item) for item in file_stored_in_list]

It's worth noting that strip(" ") doesn't work in the way you expected - it will only remove spaces from the beginning and end of your string. See the documentation.

Upvotes: 1

Related Questions