Reputation: 35
I want to reformat below text using python:
text = """17/05/2013 10:09:15,INFO,xxxxxxxxxx
yyyyyy
zzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxx
yyyyyyy
zzzzzzz"""
format them into
17/05/2013 10:09:15,INFO,xxxxxxxxxxyyyyyyzzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxxyyyyyyyzzzzzzz
I tried this:
def strip(txt):
ret=""
for l in txt.split("\n"):
if l.strip() in ['\n', '\r\n']:
ret = ret + "\n"
else:
ret = ret + l.strip()
print ret
But it turns out, code doesn't recognize the empty line and the result is like this:
17/05/2013 10:09:15,INFO,xxxxxxxxxxyyyyyyzzzzzz17/05/2013
10:09:15,INFO,xxxxxxxxyyyyyyyzzzzzzz
How do I solve this?
Upvotes: 1
Views: 257
Reputation: 27575
import re
text = """17/05/2013 10:09:15,INFO,xxxxxxxxxx
yyyyyy
zzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxx
yyyyyyy
zzzzzzz"""
pat = '(\d\d/\d\d/\d{4} \d\d:\d\d:\d\d,INFO,.*)\n(.*)\n(.*)'
regx = re.compile(pat)
print text
print '\n===================\n'
print '\n'.join('%s,%s,%s' % x for x in regx.findall(text))
jamylak's solution is better than mine. But the regex pattern can be improved as follows to eliminate successions of several empty lines:
>>> import re
>>> text = """17/05/2013 10:09:15,INFO,xxxxxxxxxx
yyyyyy
zzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxx
yyyyyyy
zzzzzzz"""
>>> print re.sub('(?<=\n)\n+(?=\n)|\n(?!\n)', '', text)
17/05/2013 10:09:15,INFO,xxxxxxxxxxyyyyyyzzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxxyyyyyyyzzzzzzz
Upvotes: 0
Reputation: 93754
If you feel comfortable for regular expressions
:
In [5]: import re
In [6]: print re.sub('[^\n]\n', '', text)
17/05/2013 10:09:15,INFO,xxxxxxxxxyyyyyzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxyyyyyyzzzzzzz
Upvotes: 1
Reputation: 133534
>>> import re
>>> text = """17/05/2013 10:09:15,INFO,xxxxxxxxxx
yyyyyy
zzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxx
yyyyyyy
zzzzzzz"""
>>> print re.sub('\n(?!\n)', '', text)
17/05/2013 10:09:15,INFO,xxxxxxxxxxyyyyyyzzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxxyyyyyyyzzzzzzz
Upvotes: 2
Reputation: 309881
I think I might try itertools.groupby
:
from itertools import groupby
lines = text.splitlines()
def is_blank(x):
return bool(x.strip())
print '\n'.join(''.join(v) for b,v in groupby(lines,is_blank) if b)
This ends up being insensitive to any number of blank lines between groups which may be desirable.
Upvotes: 1
Reputation: 59974
You can split the text into two as both are separated by two new lines:
>>> mylist = text.split('\n\n')
Then just print each value, getting rid of the new lines between the bunch of letters:
>>> for i in mylist:
... print i.replace('\n','')
...
17/05/2013 10:09:15,INFO,xxxxxxxxxxyyyyyyzzzzzz
17/05/2013 10:09:15,INFO,xxxxxxxxyyyyyyyzzzzzzz
Or if you want to store each line in a list, use a list comprehension:
>>> [i.replace('\n','') for i in mylist]
['17/05/2013 10:09:15,INFO,xxxxxxxxxxyyyyyyzzzzzz', '17/05/2013 10:09:15,INFO,xxxxxxxxyyyyyyyzzzzzzz']
Upvotes: 4