Reputation: 2839
I get a string line:
>>> line = " abc\n def\n\n ghi\n jkl"
>>> print line
abc
def
ghi
jkl
and I want to convert it to "abcdef\n\n ghijkl", like:
>>> print " abcdef\n\n ghijkl"
abcdef
ghijkl
I tried python re module, and write something like this:
re.sub('(?P<word1>[^\n\s])\n\s*(?P<word2>[^\n\s])', '\g<word1>\g<word2>', line)
but I get this:
>>> re.sub('(?P<word1>[^\n\s])\n\s*(?P<word2>[^\n\s])', '\g<word1>\g<word2>', line)
Out: ' abcdefghijkl'
It seems to me that the \n\s*
part is also matching \n\n
. Can any one point out where I get it wrong?
Upvotes: 2
Views: 663
Reputation: 120618
You could simplify the regexp if you used \S
, which matches any non-whitespace character:
>>> import re
>>> line = " abc\n def\n\n ghi\n jkl"
>>> print re.sub(r'(\S+)\n\s*(\S+)', r'\1\2', line)
abcdef
ghijkl
However, the reason why your own regexp is not working is because your <word1>
and <word2>
groups are only matching a single character (i.e. they're not using +
). So with that simple correction, your regexp will produce the correct output:
>>> print re.sub(r'(?P<word1>[^\n\s]+)\n\s*(?P<word2>[^\n\s]+)', r'\g<word1>\g<word2>', line)
abcdef
ghijkl
Upvotes: 0
Reputation: 86240
Try this,
line = " abc\n def\n\n ghi\n jkl"
print re.sub(r'\n(?!\n)\s*', '', line)
It gives,
abcdef
ghijkl
It says, "Replace a new line, followed by a space that is NOT a new line with nothing."
UPDATE: Here's a better version
>>> re.sub(r'([^\n])\n(?!\n)\s*', r'\1', line)
' abcdef\n\n ghijkl'
It gives exactly what you said in the first post.
Upvotes: 0
Reputation: 336198
\s
matches space, \t
, \n
(and, depending on your regex engine) a few other whitespace characters.
So if you only want to replace single linebreaks + spaces/tabs, you can use this:
newline = re.sub(r"(?<!\n)\n[ \t]*(?!\n)", "", line)
Explanation:
(?<!\n) # Assert that the previous character isn't a newline
\n # Match a newline
[ \t]* # Match any number of spaces/tabs
(?!\n) # Assert that the next character isn't a newline
In Python:
>>> line = " abc\n def\n\n ghi\n jkl"
>>> newline = re.sub(r"(?<!\n)\n[ \t]*(?!\n)", "", line)
>>> print newline
abcdef
ghijkl
Upvotes: 4