Reputation: 1359
I have data with the following structure:
[TimingPoints]
21082,410.958904109589,4,3,1,60,1,0
21082,-250,4,3,1,100,0,0
22725,-142.857142857143,4,3,1,100,0,0
23547,-166.666666666667,4,3,1,100,0,0
24369,-333.333333333335,4,3,1,100,0,0
27657,-200.000000000001,4,3,1,100,0,0
29301,-142.857142857143,4,3,1,100,0,0
30123,-166.666666666667,4,3,1,100,0,0
30945,-250,4,3,1,100,0,0
32588,-166.666666666667,4,3,1,100,0,0
34232,-250,4,3,1,100,0,0
35876,-142.857142857143,4,3,1,100,0,0
36698,-166.666666666667,4,3,1,100,0,0
37520,-250,4,3,1,100,0,0
42451,-142.857142857143,4,3,1,100,0,0
[HitObjects]
256,192,17794,12,0,20876,0:0:0:0:
159,96,21082,6,0,B|204:120|204:120|254:103|254:103|305:130|355:102,1,210
409,27,22725,2,0,P|446:96|405:179,1,171.499994766236
269,284,23547,2,0,B|317:250|324:193|324:193|328:220|350:236,1,146.999995513916
I'd like to read all lines under [TimingPoints] before [HitObjects] in a list. Empty lines should be ignored. So the final list should contain:
21082,410.958904109589,4,3,1,60,1,0
21082,-250,4,3,1,100,0,0
22725,-142.857142857143,4,3,1,100,0,0
23547,-166.666666666667,4,3,1,100,0,0
24369,-333.333333333335,4,3,1,100,0,0
27657,-200.000000000001,4,3,1,100,0,0
29301,-142.857142857143,4,3,1,100,0,0
30123,-166.666666666667,4,3,1,100,0,0
30945,-250,4,3,1,100,0,0
32588,-166.666666666667,4,3,1,100,0,0
34232,-250,4,3,1,100,0,0
35876,-142.857142857143,4,3,1,100,0,0
36698,-166.666666666667,4,3,1,100,0,0
37520,-250,4,3,1,100,0,0
42451,-142.857142857143,4,3,1,100,0,0
I tried it with the following regex pattern:
\[TimingPoints\]((.|\n)*)\[HitObjects]
but it does not ignore the empty lines.
How can I match the lines to get what is described above?
Also how can I load all the matched lines in a list with python?
Upvotes: 2
Views: 616
Reputation: 9711
Don't get me wrong, I'm a huge fan of regex and use it daily. But it's a bit heavy for this task.
1) Read the file into a list
and strip any whitespace (including new line characters), and drop the line if it's empty
2) Index to find '[HitObjects]' and trim from the list, along with the header
3) Done
path = './timing.txt'
with open(path, 'r') as f:
text = [i.strip() for i in f if i.strip()]
# Keep only rows between the headers of interest.
result = text[text.index('[TimingPoints]')+1:text.index('[HitObjects]')]
['21082,410.958904109589,4,3,1,60,1,0',
'21082,-250,4,3,1,100,0,0',
'22725,-142.857142857143,4,3,1,100,0,0',
'23547,-166.666666666667,4,3,1,100,0,0',
'24369,-333.333333333335,4,3,1,100,0,0',
'27657,-200.000000000001,4,3,1,100,0,0',
'29301,-142.857142857143,4,3,1,100,0,0',
'30123,-166.666666666667,4,3,1,100,0,0',
'30945,-250,4,3,1,100,0,0',
'32588,-166.666666666667,4,3,1,100,0,0',
'34232,-250,4,3,1,100,0,0',
'35876,-142.857142857143,4,3,1,100,0,0',
'36698,-166.666666666667,4,3,1,100,0,0',
'37520,-250,4,3,1,100,0,0',
'42451,-142.857142857143,4,3,1,100,0,0']
Upvotes: 1
Reputation: 7187
I'm not the biggest regex fan you'll ever meet. Here's a straightforward way of doing it without regexes:
#!/usr/local/cpython-3.1/bin/python3
# Works on CPython 3.1 through 3.9
"""Keep lines between [TimingPoints] and [HitObjects], elliding empty lines."""
def main():
"""Read input and write actual-output."""
display = False
with open('input', 'r') as infile, open('actual-output', 'w') as outfile:
for line in infile:
line_sans_n = line.rstrip('\n')
if not line_sans_n:
# Skip blank lines
continue
if line_sans_n == '[HitObjects]':
display = False
if display:
print(line_sans_n, file=outfile)
if line_sans_n == '[TimingPoints]':
display = True
HTH
Upvotes: 1