Reputation: 3141
String like that:
x = dir\tsubdir1\t\tfile1.ext\t\tsubsubdir1\tsubdir2\t\tsubsubdir2\t\t\tfile2.ext
How to 'pythonish/tricky' split above string into list?
dir
\tsubdir1
\t\tfile1.ext
\t\tsubsubdir1
\tsubdir2
\t\tsubsubdir2
\t\t\tfile2.ext
['dir', '\tsubdir1', '\t\tfile1.ext', '\t\tsubsubdir1', '\tsubdir2', '\t\tsubsubdir2', '\t\t\tfile2.ext']
Prove of concept:
x = r'dir\tsubdir1\t\tfile1.ext\t\tsubsubdir1\tsubdir2\t\tsubsubdir2\t\t\tfile2.ext'
y = x.split(r'\t')
print(y)
Upvotes: 1
Views: 282
Reputation: 2407
Another regex solution with findall():
x = dir\tsubdir1\t\tfile1.ext\t\tsubsubdir1\tsubdir2\t\tsubsubdir2\t\t\tfile2.ext
re.findall(r"\t+[^\t]+|[^\t]+",x)
Out:
['dir',
'\tsubdir1',
'\t\tfile1.ext',
'\t\tsubsubdir1',
'\tsubdir2',
'\t\tsubsubdir2',
'\t\t\tfile2.ext']
Upvotes: 0
Reputation: 1033
import re
x = 'dir\tsubdir1\t\tfile1.ext\t\tsubsubdir1\tsubdir2\t\tsubsubdir2\t\t\tfile2.ext'
s = re.sub('([^\t])\t', '\\1\n\t', x).split('\n')
print(s)
output:
['dir', '\tsubdir1', '\t\tfile1.ext', '\t\tsubsubdir1', '\tsubdir2', '\t\tsubsubdir2', '\t\t\tfile2.ext']
Upvotes: 0
Reputation: 51683
You can do this by touching each character of your path
input once + some list comp:
path = "dir\tsubdir1\t\tfile1.ext\t\tsubsubdir1\tsubdir2\t\tsubsubdir2\t\t\tfile2.ext"
l = [[]]
for c in path:
if c != "\t": # append to last element of list if not a \t
l[-1].append(c)
elif l[-1][-1] == "\t": # also append to last element of list if it's last is a \t
l[-1].append(c) # (you could 'or' it into the if before)
else:
l.append([]) # else create a new "word" and append the \t
l[-1].append(c)
l = [''.join(elem) for elem in l] # join the things back together
print(l)
Output:
['dir',
'\tsubdir1',
'\t\tfile1.ext',
'\t\tsubsubdir1',
'\tsubdir2',
'\t\tsubsubdir2',
'\t\t\tfile2.ext']
Before the join-step the accumulated lists look like this:
[['d', 'i', 'r'],
['\t', 's', 'u', 'b', 'd', 'i', 'r', '1'],
['\t', '\t', 'f', 'i', 'l', 'e', '1', '.', 'e', 'x', 't'],
['\t', '\t', 's', 'u', 'b', 's', 'u', 'b', 'd', 'i', 'r', '1'],
['\t', 's', 'u', 'b', 'd', 'i', 'r', '2'],
['\t', '\t', 's', 'u', 'b', 's', 'u', 'b', 'd', 'i', 'r', '2'],
['\t', '\t', '\t', 'f', 'i', 'l', 'e', '2', '.', 'e', 'x', 't']]
You do not want to add to strings because it creates lots of intermediate "throw-away" string instances which slows it down - usings list is much faster and less strain.
Upvotes: 2
Reputation: 37297
Maybe use a regular expression?
>>> import regex
>>> L = regex.split(r"(?<!\t)\t", "dir\tsubdir1\t\tfile1.ext\t\tsubsubdir1\tsubdir2\t\tsubsubdir2\t\t\tfile2.ext")
>>> L
['dir', 'subdir1', '\tfile1.ext', '\tsubsubdir1', 'subdir2', '\tsubsubdir2', '\t\tfile2.ext']
>>> L[:1] + ['\t' + i for i in L[1:]]
['dir', '\tsubdir1', '\t\tfile1.ext', '\t\tsubsubdir1', '\tsubdir2', '\t\tsubsubdir2', '\t\t\tfile2.ext']
The regular expression is
(?<!\t)\t
which means "a tab that's not preceded by another tab", so every first tab in a sequence of tags is matched by the regex. It's then used as the splitting mark.
After splitting, one tab is stripped from every subsequent items, so the last line L[:1] + ['\t' + i for i in L[1:]]
prepends the missing tab back.
Upvotes: 3