Reputation: 5359
I have a test file with the following format:
one
two
three
four
=
five
six
seven
eight
=
nine
ten
one
two
=
and I am writing a python code to create a list, with each element in the text to be an item in a list:
dump = sys.argv[1]
lines = []
with open(dump) as f:
for line in f:
x = line.strip()
lines.append(x)
print(lines)
lines list =
['one', 'two', 'three', 'four', '=', 'five', 'six', 'seven', 'eight', '=', 'nine', 'ten', 'one', 'two', '=']
I then get the indexes of the equals signs in order to try to use those at a later point to make a new list, combining the strings:
equals_indexes = [i for i, x in enumerate(lines) if x == '=']
equals_indexes list:
[4, 9, 14]
I am good up until this point. Now I would like to join the strings one, two, three, four before the first index as new_list element 1. I would like to join the next group of strings between equals sign 1 and 2, and the next group of strings between equals sign 2 and 3 to produce the following:
[[one two three four], [five six seven eight], [nine ten one two]]
I have tried to do this by iterating over the list of equals indexes, then iterating over the list lines:
for i in equals_indexes:
sequences = ""
for x,y in enumerate(lines):
if x < i:
sequences = ' '.join(lines[x:i])
groups.append(sequences)
print(groups)
Which produces the following:
['one two three four', 'two three four', 'three four', 'four', 'one two three four = five six seven eight', 'two three four = five six seven eight', ....]
I understand why this is happening, because at each iteration of x, it is checking to see if it is less than i and if so appending each string at x to the string "sequences". I am doing this because I have a large file with huge blocks of text corresponding to one iteration of a program. The separator between iteration 1 and iteration 2 of the program is a single '=' in the line. This way I can parse the list elements after I am able to split them by equals sign. Any help would be great!
Upvotes: 1
Views: 995
Reputation: 12918
I think this gets you what you are looking for, although there is one part that is unclear. If you want to join the strings between equals signs as each element in your final list:
with open(dump) as f:
full_string = ' '.join([line.strip() for line in f])
my_list = [string.strip() for string in full_string.split('=') if string is not '']
print(my_list)
['one two three four', 'five six seven eight', 'nine ten one two']
If, instead, you want sub-lists comprising each string between the equals signs, just replace my_list
above with:
my_list = [[s for s in string.split()] for string in full_string.split('=') if string is not '']
[['one', 'two', 'three', 'four'], ['five', 'six', 'seven', 'eight'], ['nine', 'ten', 'one', 'two']]
Bonus, they use list comprehensions which are a much more pythonic way of looping:
Upvotes: 1
Reputation: 51653
Read in lines until you hit a =, merge them as one listentry and add it, continue until done, put last line-list content in:
t = """one
two
three
four
=
five
six
seven
eight
=
nine
ten
one
two
="""
data = [] # global list
line = [] # temp list
for n in [x.strip() for x in t.splitlines()]:
if n == "=":
if line:
data.append(' '.join(line))
line = []
else:
line.append(n)
if line:
data.append(' '.join(line))
print(data)
Output:
['one two three four', 'five six seven eight', 'nine ten one two']
Upvotes: 1
Reputation: 390
Here's a small IDLE example:
>>> stuff = ['a', 'b', 'c', '=', 'd', 'e', '=', 'f', 'g']
>>> "".join(stuff).split('=')
['abc', 'de', 'fg']
It joins all of the characters together (So you can skip separating them out into separate lists), and then splits that string on the =
character.
Upvotes: 1