donie
donie

Reputation: 35

Join lines that separated by empty lines in Python

I want to reformat below text using python:

text = """17/05/2013 10:09:15,INFO,xxxxxxxxxx
yyyyyy
zzzzzz

17/05/2013 10:09:15,INFO,xxxxxxxx
yyyyyyy
zzzzzzz"""

format them into

17/05/2013 10:09:15,INFO,xxxxxxxxxxyyyyyyzzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxxyyyyyyyzzzzzzz

I tried this:

def strip(txt):
ret=""
for l in txt.split("\n"):
    if l.strip() in ['\n', '\r\n']:
        ret = ret + "\n"
            else:
            ret = ret + l.strip()
print ret

But it turns out, code doesn't recognize the empty line and the result is like this:

17/05/2013 10:09:15,INFO,xxxxxxxxxxyyyyyyzzzzzz17/05/2013 
10:09:15,INFO,xxxxxxxxyyyyyyyzzzzzzz

How do I solve this?

Upvotes: 1

Views: 257

Answers (5)

eyquem
eyquem

Reputation: 27575

import re

text = """17/05/2013 10:09:15,INFO,xxxxxxxxxx
yyyyyy
zzzzzz

17/05/2013 10:09:15,INFO,xxxxxxxx
yyyyyyy
zzzzzzz"""

pat = '(\d\d/\d\d/\d{4} \d\d:\d\d:\d\d,INFO,.*)\n(.*)\n(.*)'
regx = re.compile(pat)

print text
print '\n===================\n'
print '\n'.join('%s,%s,%s' % x for x in regx.findall(text))

EDIT

jamylak's solution is better than mine. But the regex pattern can be improved as follows to eliminate successions of several empty lines:

>>> import re
>>> text = """17/05/2013 10:09:15,INFO,xxxxxxxxxx
yyyyyy
zzzzzz





17/05/2013 10:09:15,INFO,xxxxxxxx
yyyyyyy
zzzzzzz"""
>>> print re.sub('(?<=\n)\n+(?=\n)|\n(?!\n)', '', text)
17/05/2013 10:09:15,INFO,xxxxxxxxxxyyyyyyzzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxxyyyyyyyzzzzzzz

Upvotes: 0

waitingkuo
waitingkuo

Reputation: 93754

If you feel comfortable for regular expressions:

In [5]: import re
In [6]: print re.sub('[^\n]\n', '', text)
17/05/2013 10:09:15,INFO,xxxxxxxxxyyyyyzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxyyyyyyzzzzzzz 

Upvotes: 1

jamylak
jamylak

Reputation: 133534

>>> import re
>>> text = """17/05/2013 10:09:15,INFO,xxxxxxxxxx
yyyyyy
zzzzzz

17/05/2013 10:09:15,INFO,xxxxxxxx
yyyyyyy
zzzzzzz"""
>>> print re.sub('\n(?!\n)', '', text)
17/05/2013 10:09:15,INFO,xxxxxxxxxxyyyyyyzzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxxyyyyyyyzzzzzzz

Upvotes: 2

mgilson
mgilson

Reputation: 309881

I think I might try itertools.groupby:

from itertools import groupby
lines = text.splitlines()
def is_blank(x):
    return bool(x.strip())
print '\n'.join(''.join(v) for b,v in groupby(lines,is_blank) if b)

This ends up being insensitive to any number of blank lines between groups which may be desirable.

Upvotes: 1

TerryA
TerryA

Reputation: 59974

You can split the text into two as both are separated by two new lines:

>>> mylist = text.split('\n\n')

Then just print each value, getting rid of the new lines between the bunch of letters:

>>> for i in mylist:
...     print i.replace('\n','')
... 
17/05/2013 10:09:15,INFO,xxxxxxxxxxyyyyyyzzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxxyyyyyyyzzzzzzz

Or if you want to store each line in a list, use a list comprehension:

>>> [i.replace('\n','') for i in mylist]
['17/05/2013 10:09:15,INFO,xxxxxxxxxxyyyyyyzzzzzz', '17/05/2013 10:09:15,INFO,xxxxxxxxyyyyyyyzzzzzzz']

Upvotes: 4

Related Questions