Reputation: 642
I'm creating a Python/Django app and I need to clean up a string, but the main problem I have is that the string has too many line breaks in some parts. I don't want to delete all the line breaks, just the excess of them. How can I archive this in python? I'm using Python 2.7 and Django 1.6
Upvotes: 0
Views: 3384
Reputation: 4793
Best I could do, but Peter DeGlopper's was better.
import re
s = '\n' * 9 + 'abc' + '\n'*10
# s == '\n\n\n\n\n\n\n\n\nabc\n\n\n\n\n\n\n\n\n\n\n'
lines = re.compile('\n+')
excess_lines = lines.findall(s)
# excess_lines == ['\n' * 9, '\n' * 10]
# I feel as though there is a better way, but this works
def cmplen(first, second):
'''
Function to order strings in descending order by length
Needed so that we replace longer strings of new lines first
'''
if len(first) < len(second):
return 1
elif len(first) > len(second):
return -1
else:
return 0
excess_lines.sort(cmp=cmplen)
# excess_lines == ['\n' * 10, '\n' * 9]
for lines in excess_lines:
s = s.replace(lines, '\n')
# s = '\nabc\n'
This solution feels dirty and inelegant, but it works. You need to sort by string length because if you have a string '\n\n\n aaaaaaa \n\n\n\n' and do a replace(), the \n\n\n will replace \n\n\n\n with \n\n, and not be caught later on.
Upvotes: 0
Reputation: 37319
A regexp is one way. Using your updated sample input:
>>> a = "This is my sample text.\r\n\r\n\r\n\r\n\r\n Here start another sample text"
>>> import re
>>> re.sub(r'(\r\n){2,}','\r\n', a)
'This is my sample text.\r\n Here start another sample text'
r'(\r\n)+'
would work too, I just like using the 2+ lower bound to avoid some replacements of singleton \r\n
substrings with the same substring.
Or you can use the splitlines
method on the string and rejoin after filtering:
>>> '\r\n'.join(line for line in a.splitlines() if line)
Upvotes: 1
Reputation: 126
To use a regex to replace multiple occurrences of newline with a single one (or something else you prefer such as a period, tab or whatever), try:
import re
testme = 'Some text.\nSome more text.\n\nEven more text.\n\n\n\n\nThe End'
print re.sub('\n+', '\n', testme)
Note that '\n' is a single-character (a newline), not two characters (literal backslash and 'n').
You can of course compile the regex in advance if you intend to re-use it:
pattern = re.compile('\n+')
print pattern.sub('\n', testme)
Upvotes: 0
Reputation:
import re
a = 'string with \n a \n\n few too many\n\n\n lines'
re.sub('\n+', '\n', a)
Upvotes: 0
Reputation: 795
As an example, if you know what you want to replace:
>>> a = 'string with \n a \n\n few too many\n\n\n lines'
>>> a.replace('\n'*2, '\n') # Replaces \n\n with just \n
'string with \n a \n few too many\n\n lines'
>>> a.replace('\n'*3, '') # Replaces \n\n\n with nothing...
'string with \n a \n\n few too many lines'
Or, using regular expression to find what you want
>>> import re
>>> re.findall(r'.*([\n]+).*', a)
['\n', '\n\n', '\n\n\n']
Upvotes: 0