Splitting up string of text based on keywords python

Question

I have a string of text like this:

'tx cycle up.... down
rx cycle up.... down
phase:...
rx on scan: 123456
tx cycle up.... down
rx cycle up.... down
phase:...
rx on scan: 789012
setup
tx cycle up.... down
rx cycle up.... down
tx cycle up.... down
rx cycle up.... down'

I need to split this string up into a list of strings that are split up into these chunks:

['tx cycle up.... down rx cycle up.... down phase:.... rx on scan: 123456', 
 'tx cycle up.... down rx cycle up.... down phase:.... rx on scan: 789012',
 'tx cycle up... down rx cycle up.... down',
 'tx cycle up... down rx cycle up.... down']

Sometimes they have a 'phase' and 'scan' number but sometimes they do not, and I need this to be general enough to apply to any of these cases and will have to do this to lots of data.

Basically, I want to split it into a list of strings where each element extends from an occurrence of 'tx' to the next 'tx' (including the first 'tx' but not the next one in that element). How can I do this?

Edit: Suppose that in addition to the string of text above I have other strings of text that appear like this:

'closeloop start
closeloop ..up:677 down:098
closeloop start
closeloop ..up:568 down:123'

My code is going through each of the strings of text and splitting it into lists with the splitting code. But when it gets to this string of text it won't find anything to split -- so how can I include a command to split at the 'closeloop start' lines if they appear and the tx lines like before if those appear? I tried this code but I got a TypeError:

data = re.split(r'
((?=tx)|(?=closeloop\sstart))', data)

Martijn Pieters · Accepted Answer

You can split on newlines that are followed by tx:

import re

re.split(r'
(?=tx)', inputtext)

Demo:

>>> import re
>>> inputtext = '''tx cycle up.... down
... rx cycle up.... down
... phase:...
... rx on scan: 123456
... tx cycle up.... down
... rx cycle up.... down
... phase:...
... rx on scan: 789012
... setup
... tx cycle up.... down
... rx cycle up.... down
... tx cycle up.... down
... rx cycle up.... down'''
>>> re.split(r'
(?=tx)', inputtext)
['tx cycle up.... down
rx cycle up.... down
phase:...
rx on scan: 123456', 'tx cycle up.... down
rx cycle up.... down
phase:...
rx on scan: 789012
setup', 'tx cycle up.... down
rx cycle up.... down', 'tx cycle up.... down
rx cycle up.... down']
>>> from pprint import pprint
>>> pprint(_)
['tx cycle up.... down
rx cycle up.... down
phase:...
rx on scan: 123456',
 'tx cycle up.... down
rx cycle up.... down
phase:...
rx on scan: 789012
setup',
 'tx cycle up.... down
rx cycle up.... down',
 'tx cycle up.... down
rx cycle up.... down']

However, if you were to just loop over the input file object (reading line by line), you could just process each block as you gather lines:

section = []
for line in open_file_object:
    if line.startswith('tx'):
        # new section
        if section:
            process_section(section)
        section = [line]
    else:
        section.append(line)
if section:
    process_section(section)

If you need to match multiple starting lines, include each as a |-separated alternative in the look-ahead:

data = re.split(r'
(?=tx|closeloop\sstart)', data)

Splitting up string of text based on keywords python

Answers (1)

Related Questions