Reputation: 59
I have this string in python:
1 test11-1-swi-2 2 test11-swi-3 3 26-ca-20-p-3 4 26-ca-20-p-4
5 test11-labdist-rtr-1 6 test11-labdist-rtr-2 7 pmac-fw 8 pmac-swi
9 pmac-server 10 test11-swi-2 11 test-2400-rtr-6 12 test-2400-rtr-5
13 27-4c-da-p-13 14 27-4c-da-p-14 15 test11-1500-rtr
And I want to split it into x strings, in this case 15, but it could be many more.
I've tried using:
[s.strip() for s in t.split(' ') if s]
Which accounts for 2 or more spaces, but some of the substrings are only 1 space apart.
What could work is,
find "1 ",
then find "2 ",
once you find "2 ", create a substring from "1 " to the character before "2"
e.g.
1 test11-1-swi-2
2 test11-swi-3
3 26-ca-20-p-3
4 26-ca-20-p-4
5 test11-labdist-rtr-1
6 test11-labdist-rtr-2
7 pmac-fw
8 pmac-swi
9 pmac-server
10 test11-2400-swi-2
11 test-2400-rtr-6
12 test-2400-rtr-5
13 27-4c-da-p-13
14 27-4c-da-p-14
15 test11-1500-rtr
Upvotes: 1
Views: 786
Reputation: 16
Use re.findall
for simplicity:
import re
re.findall(r'\d+\s+.*?(?=\s|$)', s)
Output:
['1 test11-1-swi-2',
'2 test11-swi-3',
'3 26-ca-20-p-3',
'4 26-ca-20-p-4',
'5 test11-labdist-rtr-1',
'6 test11-labdist-rtr-2',
'7 pmac-fw',
'8 pmac-swi',
'9 pmac-server',
'10 test11-swi-2',
'11 test-2400-rtr-6',
'12 test-2400-rtr-5',
'13 27-4c-da-p-13',
'14 27-4c-da-p-14',
'15 test11-1500-rtr']
This regex simply means:
\d+
\s+
.*?
(in lazy way)(?=\s|$)
Upvotes: 0
Reputation: 79015
Instead of splitting it, you can get all substrings using the regex, \d+\s+[^\s]*
.
\d+
: One or more times digits character(s)\s+
: One or more times whitespace character(s)[^\s]*
: Zero or more times any non-whitespace character(s).Demo:
import re
from pprint import pprint
s = """
1 test11-1-swi-2 2 test11-swi-3 3 26-ca-20-p-3 4 26-ca-20-p-4
5 test11-labdist-rtr-1 6 test11-labdist-rtr-2 7 pmac-fw 8 pmac-swi
9 pmac-server 10 test11-swi-2 11 test-2400-rtr-6 12 test-2400-rtr-5 13 27-4c-da-p-13 14 27-4c-da-p-14 15 test11-1500-rtr
"""
pprint(re.findall(r'\d+\s+[^\s]*', s))
Output:
['1 test11-1-swi-2',
'2 test11-swi-3',
'3 26-ca-20-p-3',
'4 26-ca-20-p-4',
'5 test11-labdist-rtr-1',
'6 test11-labdist-rtr-2',
'7 pmac-fw',
'8 pmac-swi',
'9 pmac-server',
'10 test11-swi-2',
'11 test-2400-rtr-6',
'12 test-2400-rtr-5',
'13 27-4c-da-p-13',
'14 27-4c-da-p-14',
'15 test11-1500-rtr']
Upvotes: 1
Reputation: 75840
I came up with a re.split()
using:
\s*(?<!\S)(?=\d+ )
See an online demo.
import re
s = """
1 test11-1-swi-2 2 test11-swi-3 3 26-ca-20-p-3 4 26-ca-20-p-4
5 test11-labdist-rtr-1 6 test11-labdist-rtr-2 7 pmac-fw 8 pmac-swi
9 pmac-server 10 test11-2400-oci-swi-2 11 test-2400-rtr-6 12 test-2400-rtr-5 13 27-4c-da-p-13 14 27-4c-da-p-14 15 test11-1500-cgbu-rtr"""
lst = list(filter(None,re.split(r'\s*(?<!\S)(?=\d+ )', s)))
print(lst) # ['1 test11-1-swi-2', '2 test11-swi-3', '3 26-ca-20-p-3', '4 26-ca-20-p-4', '5 test11-labdist-rtr-1', '6 test11-labdist-rtr-2', '7 pmac-fw', '8 pmac-swi', '9 pmac-server', '10 test11-2400-oci-swi-2', '11 test-2400-rtr-6', '12 test-2400-rtr-5', '13 27-4c-da-p-13', '14 27-4c-da-p-14', '15 test11-1500-cgbu-rtr']
Upvotes: 1
Reputation: 163217
You can use re.split and match 2 or more whitespace chars.
import re
from pprint import pprint
t = (" 1 test11-1-swi-2 2 test11-swi-3 3 26-ca-20-p-3 4 26-ca-20-p-4 \n"
" 5 test11-labdist-rtr-1 6 test11-labdist-rtr-2 7 pmac-fw 8 pmac-swi \n"
" 9 pmac-server 10 test11-swi-2 11 test-2400-rtr-6 12 test-2400-rtr-5\n"
" 13 27-4c-da-p-13 14 27-4c-da-p-14 15 test11-1500-rtr")
res = [s.strip() for s in re.split(r"\s{2,}", t) if s]
pprint(res)
Output
['1 test11-1-swi-2',
'2 test11-swi-3',
'3 26-ca-20-p-3',
'4 26-ca-20-p-4',
'5 test11-labdist-rtr-1 6 test11-labdist-rtr-2 7 pmac-fw',
'8 pmac-swi',
'9 pmac-server',
'10 test11-swi-2 11 test-2400-rtr-6 12 test-2400-rtr-5',
'13 27-4c-da-p-13',
'14 27-4c-da-p-14',
'15 test11-1500-rtr']
Upvotes: 1