Andrés Root
Andrés Root

Reputation: 642

Python string clean up: How can I remove the excess of newlines of a string in python?

I'm creating a Python/Django app and I need to clean up a string, but the main problem I have is that the string has too many line breaks in some parts. I don't want to delete all the line breaks, just the excess of them. How can I archive this in python? I'm using Python 2.7 and Django 1.6

Upvotes: 0

Views: 3384

Answers (5)

limasxgoesto0
limasxgoesto0

Reputation: 4793

Best I could do, but Peter DeGlopper's was better.

import re
s = '\n' * 9 + 'abc' + '\n'*10
# s == '\n\n\n\n\n\n\n\n\nabc\n\n\n\n\n\n\n\n\n\n\n'
lines = re.compile('\n+')
excess_lines = lines.findall(s)
# excess_lines == ['\n' * 9, '\n' * 10]
# I feel as though there is a better way, but this works

def cmplen(first, second):
    '''
    Function to order strings in descending order by length
    Needed so that we replace longer strings of new lines first
    '''

    if len(first) < len(second):
        return 1
    elif len(first) > len(second):
        return -1
    else:
        return 0

excess_lines.sort(cmp=cmplen)
# excess_lines == ['\n' * 10, '\n' * 9]
for lines in excess_lines:
    s = s.replace(lines, '\n')

# s = '\nabc\n'

This solution feels dirty and inelegant, but it works. You need to sort by string length because if you have a string '\n\n\n aaaaaaa \n\n\n\n' and do a replace(), the \n\n\n will replace \n\n\n\n with \n\n, and not be caught later on.

Upvotes: 0

Peter DeGlopper
Peter DeGlopper

Reputation: 37319

A regexp is one way. Using your updated sample input:

>>> a = "This is my sample text.\r\n\r\n\r\n\r\n\r\n Here start another sample text"
>>> import re
>>> re.sub(r'(\r\n){2,}','\r\n', a)
'This is my sample text.\r\n Here start another sample text'

r'(\r\n)+' would work too, I just like using the 2+ lower bound to avoid some replacements of singleton \r\n substrings with the same substring.

Or you can use the splitlines method on the string and rejoin after filtering:

>>> '\r\n'.join(line for line in a.splitlines() if line)

Upvotes: 1

langton
langton

Reputation: 126

To use a regex to replace multiple occurrences of newline with a single one (or something else you prefer such as a period, tab or whatever), try:

import re
testme = 'Some text.\nSome more text.\n\nEven more text.\n\n\n\n\nThe End'
print re.sub('\n+', '\n', testme)

Note that '\n' is a single-character (a newline), not two characters (literal backslash and 'n').

You can of course compile the regex in advance if you intend to re-use it:

pattern = re.compile('\n+')
print pattern.sub('\n', testme)

Upvotes: 0

user271586
user271586

Reputation:

import re 
a = 'string with \n a \n\n few too many\n\n\n lines'
re.sub('\n+', '\n', a)

Upvotes: 0

jakebrinkmann
jakebrinkmann

Reputation: 795

As an example, if you know what you want to replace:

>>> a = 'string with \n a \n\n few too many\n\n\n lines'
>>> a.replace('\n'*2, '\n') # Replaces \n\n with just \n
'string with \n a \n few too many\n\n lines'
>>> a.replace('\n'*3, '') # Replaces \n\n\n with nothing...
'string with \n a \n\n few too many lines'

Or, using regular expression to find what you want

>>> import re
>>> re.findall(r'.*([\n]+).*', a)
['\n', '\n\n', '\n\n\n']

Upvotes: 0

Related Questions