Reputation: 539
The goal of my codes are to write a function and return a list of strings, in which the successive strings (fruit name) correspond to the consecutive #No.1...#No.5
. The whole name of the fruit was split over multiple lines, and I want to display the fruit name in the list as a single string with no whitespace.
I expect my codes return:
['Pear', 'Apple', 'Cherry', 'Banana', 'Peach']
but I got:
['', 'Pear', 'Apple', 'Cherry', 'Banana', 'Peach']
These are the contents of my file fruit.txt
:
#NO.1
P
ear
#NO.2
A
pp
l
e
#NO.3
Cherry
#NO.4
Banan
a
#NO.5
Pea
c
h
These are my codes:
def read(filename):
myfile = open('fruit', 'r')
seq = ''
list1 = []
for line in myfile:
if line[0] != '#':
seq +=line.rstrip('\n')
else:
list1.append(seq)
seq = ''
list1.append(seq)
return list1
how to avoid to append an empty string which is not what I want? I suppose I just need to adjust the position a certain line of codes, any suggestion is appreciated.
Upvotes: 3
Views: 4058
Reputation: 16516
Quick fix for removing empty strings from a list:
list1 = filter(None, list1)
How about this solution with regex? The following is a two-step process. First all whitespace like newlines, spaces etc. is removed. Then all words following your pattern #No.\d
are found:
import re
whitespace = re.compile(r'\s*')
fruitdef = re.compile(r'#NO\.\d(\w*)')
inputfile = open('fruit', 'r').read()
inputstring = re.sub(whitespace, '', inputfile)
fruits = re.findall(fruitdef, inputstring)
print fruits
['Pear', 'Apple', 'Cherry', 'Banana', 'Peach']
Minified to a oneliner:
import re
print re.findall(r'#NO\.\d(\w*)', re.sub(r'\s*', '', open('fruit', 'r').read()))
Upvotes: 1
Reputation: 495
Alternative if you'd like a single line solution:
with open('fruit.txt') as f:
content = f.read()
output = [''.join(x.split('\n')[1:len(x.split('\n'))+1]) for x in content.split('#') if len(x.split('\n')) > 1]
Upvotes: 1
Reputation: 500683
You could change the
else:
to
elif seq:
This checks whether seq
is empty and only appends it if it's not.
Upvotes: 4