Kyu96
Kyu96

Reputation: 1359

Python regex ignore empty lines

I have data with the following structure:

[TimingPoints]
21082,410.958904109589,4,3,1,60,1,0
21082,-250,4,3,1,100,0,0
22725,-142.857142857143,4,3,1,100,0,0
23547,-166.666666666667,4,3,1,100,0,0

24369,-333.333333333335,4,3,1,100,0,0
27657,-200.000000000001,4,3,1,100,0,0
29301,-142.857142857143,4,3,1,100,0,0
30123,-166.666666666667,4,3,1,100,0,0
30945,-250,4,3,1,100,0,0

32588,-166.666666666667,4,3,1,100,0,0
34232,-250,4,3,1,100,0,0
35876,-142.857142857143,4,3,1,100,0,0
36698,-166.666666666667,4,3,1,100,0,0
37520,-250,4,3,1,100,0,0
42451,-142.857142857143,4,3,1,100,0,0


[HitObjects]
256,192,17794,12,0,20876,0:0:0:0:
159,96,21082,6,0,B|204:120|204:120|254:103|254:103|305:130|355:102,1,210
409,27,22725,2,0,P|446:96|405:179,1,171.499994766236
269,284,23547,2,0,B|317:250|324:193|324:193|328:220|350:236,1,146.999995513916

I'd like to read all lines under [TimingPoints] before [HitObjects] in a list. Empty lines should be ignored. So the final list should contain:

21082,410.958904109589,4,3,1,60,1,0
21082,-250,4,3,1,100,0,0
22725,-142.857142857143,4,3,1,100,0,0
23547,-166.666666666667,4,3,1,100,0,0
24369,-333.333333333335,4,3,1,100,0,0
27657,-200.000000000001,4,3,1,100,0,0
29301,-142.857142857143,4,3,1,100,0,0
30123,-166.666666666667,4,3,1,100,0,0
30945,-250,4,3,1,100,0,0
32588,-166.666666666667,4,3,1,100,0,0
34232,-250,4,3,1,100,0,0
35876,-142.857142857143,4,3,1,100,0,0
36698,-166.666666666667,4,3,1,100,0,0
37520,-250,4,3,1,100,0,0
42451,-142.857142857143,4,3,1,100,0,0

I tried it with the following regex pattern: \[TimingPoints\]((.|\n)*)\[HitObjects] but it does not ignore the empty lines. How can I match the lines to get what is described above? Also how can I load all the matched lines in a list with python?

Upvotes: 2

Views: 616

Answers (2)

s3dev
s3dev

Reputation: 9711

Don't get me wrong, I'm a huge fan of regex and use it daily. But it's a bit heavy for this task.

1) Read the file into a list and strip any whitespace (including new line characters), and drop the line if it's empty
2) Index to find '[HitObjects]' and trim from the list, along with the header
3) Done

Sample Code:

path = './timing.txt'

with open(path, 'r') as f:
    text = [i.strip() for i in f if i.strip()]

# Keep only rows between the headers of interest.
result = text[text.index('[TimingPoints]')+1:text.index('[HitObjects]')]

Output:

['21082,410.958904109589,4,3,1,60,1,0',
 '21082,-250,4,3,1,100,0,0',
 '22725,-142.857142857143,4,3,1,100,0,0',
 '23547,-166.666666666667,4,3,1,100,0,0',
 '24369,-333.333333333335,4,3,1,100,0,0',
 '27657,-200.000000000001,4,3,1,100,0,0',
 '29301,-142.857142857143,4,3,1,100,0,0',
 '30123,-166.666666666667,4,3,1,100,0,0',
 '30945,-250,4,3,1,100,0,0',
 '32588,-166.666666666667,4,3,1,100,0,0',
 '34232,-250,4,3,1,100,0,0',
 '35876,-142.857142857143,4,3,1,100,0,0',
 '36698,-166.666666666667,4,3,1,100,0,0',
 '37520,-250,4,3,1,100,0,0',
 '42451,-142.857142857143,4,3,1,100,0,0']

Upvotes: 1

dstromberg
dstromberg

Reputation: 7187

I'm not the biggest regex fan you'll ever meet. Here's a straightforward way of doing it without regexes:

#!/usr/local/cpython-3.1/bin/python3  

# Works on CPython 3.1 through 3.9  

"""Keep lines between [TimingPoints] and [HitObjects], elliding empty lines."""  


def main():  
    """Read input and write actual-output."""  
    display = False                            
    with open('input', 'r') as infile, open('actual-output', 'w') as outfile:  
        for line in infile:                                                    
            line_sans_n = line.rstrip('\n')  
            if not line_sans_n:              
                # Skip blank lines  
                continue            
            if line_sans_n == '[HitObjects]':  
                display = False                
            if display:           
                print(line_sans_n, file=outfile)  
            if line_sans_n == '[TimingPoints]':   
                display = True

HTH

Upvotes: 1

Related Questions