splitting line in python

Question

I have data of the below mentioned form:

   
   <"a Country's Welfare">
   <"Happy Facebooking => Enjoy">

Now I want to split each line given above based on the delimiter <>. That is I want to split as:
['', '', '', ''] ['', '', '', '<"a Country\'s Welfare">'] ['', '', '', '<"Happy Facebooking => Enjoy">']

I tried splitting based on space and "> " but it does not work. Is there some other way in python by which I may split in the manner described above. Since my file size is 1 TB therefore I can not do so manually.

Martijn Pieters · Accepted Answer

You want to split on the whitespace between the > and < characters. For that you need a regular expression split with look-behind and look-ahead assertions:

import re

re.split('(?<=>)\s+(?=<)', line)

This splits on any whitespace (\s+) that is preceded by a > and followed by a < character.

The (?<=...) expression is a look-behind assertion; it matches a location in the input text, namely anywhere the pattern inside the assertion precedes the location. In the above it matches anywhere there is a > character just before the current location.

The (?=...) expression works just like the look-behind assertion, but instead looks for matching characters after the current location. It is known as a look-ahead assertion. (?=<) means it'll match to any location that is followed by the < character.

Together these form two anchors, an the \s+ in between will only match whitespace that sits between a > and a <, but not those two characters themselves. The split breaks up the input string by removing the matched text, and only the spaces are matched, leaving the > and < characters attached to the text being split.

Demo:

>>> re.split('(?<=>)\s+(?=<)', '   ')
['', '', '', '']
>>> re.split('(?<=>)\s+(?=<)', '''   <"a Country's Welfare">''')
['', '', '', '<"a Country\'s Welfare">']
>>> re.split('(?<=>)\s+(?=<)', '   <"Happy Facebooking => Enjoy">')
['', '', '', '<"Happy Facebooking => Enjoy">']

splitting line in python

Answers (2)

Related Questions