Reputation: 1181
like this
text = " \t hello there\n \t how are you?\n \t HHHH"
hello there
how are you?
HHHH
Could I get the common prefix substring through regex?
I try to
In [36]: re.findall(r"(?m)(?:(^[ \t]+).+[\n\r]+\1)", " \t hello there\n \t how are you?\n \t HHHH")
Out[36]: [' \t ']
But apparently that common prefix substring is ' \t '
I want use for dedent
function like python textwrap module.
Upvotes: 0
Views: 385
Reputation: 12164
import os
#not just for paths...
text = " \t hello there\n \t how are you?\n \t HHHH"
li = text.split("\n")
common = os.path.commonprefix(li)
li = [i[len(common):] for i in li]
for i in li:
print i
=>
hello there
how are you?
HHHH
Upvotes: 0
Reputation: 214959
Here's an expression that finds a common prefix in a text:
r'^(.+).*(\n\1.*)*$'
Example:
import re
text = (
"No Red Leicester\n"
"No Tilsit\n"
"No Red Windsor"
)
m = re.match(r'^(.+).*(\n\1.*)*$', text)
if m:
print 'common prefix is', m.group(1)
else:
print 'no common prefix'
Note that this expression involves a lot of backtracking, so use it wisely, especially on large inputs.
To find out the longest common "space" prefix, just find them all and apply len
:
def dedent(text):
prefix_len = min(map(len, re.findall('(?m)^\s+', text)))
return re.sub(r'(?m)^.{%d}' % prefix_len, '', text)
text = (
" No Red Leicester\n"
" No Tilsit\n"
"\t\t No Red Windsor"
)
print dedent(text)
Upvotes: 1
Reputation:
I'm not that good with Python, so, maybe this code doesn't look idiomatic for the language, but algorithmically, it should be good:
>>> import StringIO
...
>>> def strip_common_prefix(text):
... position = text.find('\n')
... offset = position
... match = text[: position + 1]
... lines = [match]
... while match and position != len(text):
... next_line = text.find('\n', position + 1)
... if next_line == -1: next_line = len(text)
... line = text[position + 1 : next_line + 1]
... position = next_line
... lines.append(line)
... i = 0
... for a, b in zip(line, match):
... if i > offset or a != b: break
... i += 1
... offset = i
... match = line[: offset]
... buf = StringIO.StringIO()
... for line in lines:
... if not match: buf.write(line)
... else: buf.write(line[offset :])
... text = buf.getvalue()
... buf.close()
... return text
...
>>> strip_common_prefix(" \t hello there\n \t how are you?\n \t HHHH")
' hello there\n how are you?\nHHHH'
>>>
Regular expression will have a lot of overhead on top of this.
Upvotes: 0