hgalvan
hgalvan

Reputation: 59

Find a number followed by a space and split the string

I have this string in python:

 1 test11-1-swi-2    2 test11-swi-3      3 26-ca-20-p-3     4 26-ca-20-p-4    
 5 test11-labdist-rtr-1 6 test11-labdist-rtr-2 7 pmac-fw          8 pmac-swi        
 9 pmac-server      10 test11-swi-2 11 test-2400-rtr-6 12 test-2400-rtr-5
 13 27-4c-da-p-13   14 27-4c-da-p-14   15 test11-1500-rtr

And I want to split it into x strings, in this case 15, but it could be many more.

I've tried using:

[s.strip() for s in t.split('  ') if s]

Which accounts for 2 or more spaces, but some of the substrings are only 1 space apart.

What could work is,

find "1 ",

then find "2 ",

once you find "2 ", create a substring from "1 " to the character before "2"

e.g.

1 test11-1-swi-2

2 test11-swi-3

3 26-ca-20-p-3

4 26-ca-20-p-4

5 test11-labdist-rtr-1

6 test11-labdist-rtr-2

7 pmac-fw

8 pmac-swi

9 pmac-server

10 test11-2400-swi-2

11 test-2400-rtr-6

12 test-2400-rtr-5

13 27-4c-da-p-13

14 27-4c-da-p-14

15 test11-1500-rtr

Upvotes: 1

Views: 786

Answers (4)

Cezary Pukownik
Cezary Pukownik

Reputation: 16

Use re.findall for simplicity:

import re
re.findall(r'\d+\s+.*?(?=\s|$)', s)

Output:

['1 test11-1-swi-2',
'2 test11-swi-3',
'3 26-ca-20-p-3',
'4 26-ca-20-p-4',
'5 test11-labdist-rtr-1',
'6 test11-labdist-rtr-2',
'7 pmac-fw',
'8 pmac-swi',
'9 pmac-server',
'10 test11-swi-2',
'11 test-2400-rtr-6',
'12 test-2400-rtr-5',
'13 27-4c-da-p-13',
'14 27-4c-da-p-14',
'15 test11-1500-rtr']

This regex simply means:

  • find digit (one or more) \d+
  • then space (one or more) \s+
  • then any character string .*? (in lazy way)
  • then space or end of string (?=\s|$)

Upvotes: 0

Arvind Kumar Avinash
Arvind Kumar Avinash

Reputation: 79015

Instead of splitting it, you can get all substrings using the regex, \d+\s+[^\s]*.

  • \d+: One or more times digits character(s)
  • \s+: One or more times whitespace character(s)
  • [^\s]*: Zero or more times any non-whitespace character(s).

Demo:

import re
from pprint import pprint

s = """
1 test11-1-swi-2 2 test11-swi-3 3 26-ca-20-p-3 4 26-ca-20-p-4
5 test11-labdist-rtr-1 6 test11-labdist-rtr-2 7 pmac-fw 8 pmac-swi
9 pmac-server 10 test11-swi-2 11 test-2400-rtr-6 12 test-2400-rtr-5 13 27-4c-da-p-13 14 27-4c-da-p-14 15 test11-1500-rtr
"""

pprint(re.findall(r'\d+\s+[^\s]*', s))

Output:

['1 test11-1-swi-2',
 '2 test11-swi-3',
 '3 26-ca-20-p-3',
 '4 26-ca-20-p-4',
 '5 test11-labdist-rtr-1',
 '6 test11-labdist-rtr-2',
 '7 pmac-fw',
 '8 pmac-swi',
 '9 pmac-server',
 '10 test11-swi-2',
 '11 test-2400-rtr-6',
 '12 test-2400-rtr-5',
 '13 27-4c-da-p-13',
 '14 27-4c-da-p-14',
 '15 test11-1500-rtr']

Upvotes: 1

JvdV
JvdV

Reputation: 75840

I came up with a re.split() using:

\s*(?<!\S)(?=\d+ )

See an online demo.

import re
s = """
1 test11-1-swi-2 2 test11-swi-3 3 26-ca-20-p-3 4 26-ca-20-p-4
5 test11-labdist-rtr-1 6 test11-labdist-rtr-2 7 pmac-fw 8 pmac-swi
9 pmac-server 10 test11-2400-oci-swi-2 11 test-2400-rtr-6 12 test-2400-rtr-5 13 27-4c-da-p-13 14 27-4c-da-p-14 15 test11-1500-cgbu-rtr"""
lst = list(filter(None,re.split(r'\s*(?<!\S)(?=\d+ )', s)))
print(lst) # ['1 test11-1-swi-2', '2 test11-swi-3', '3 26-ca-20-p-3', '4 26-ca-20-p-4', '5 test11-labdist-rtr-1', '6 test11-labdist-rtr-2', '7 pmac-fw', '8 pmac-swi', '9 pmac-server', '10 test11-2400-oci-swi-2', '11 test-2400-rtr-6', '12 test-2400-rtr-5', '13 27-4c-da-p-13', '14 27-4c-da-p-14', '15 test11-1500-cgbu-rtr']

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163217

You can use re.split and match 2 or more whitespace chars.

import re
from pprint import pprint

t = (" 1 test11-1-swi-2    2 test11-swi-3      3 26-ca-20-p-3     4 26-ca-20-p-4    \n"
            " 5 test11-labdist-rtr-1 6 test11-labdist-rtr-2 7 pmac-fw          8 pmac-swi        \n"
            " 9 pmac-server      10 test11-swi-2 11 test-2400-rtr-6 12 test-2400-rtr-5\n"
            " 13 27-4c-da-p-13   14 27-4c-da-p-14   15 test11-1500-rtr")

res = [s.strip() for s in re.split(r"\s{2,}", t) if s]
pprint(res)

Output

['1 test11-1-swi-2',
 '2 test11-swi-3',
 '3 26-ca-20-p-3',
 '4 26-ca-20-p-4',
 '5 test11-labdist-rtr-1 6 test11-labdist-rtr-2 7 pmac-fw',
 '8 pmac-swi',
 '9 pmac-server',
 '10 test11-swi-2 11 test-2400-rtr-6 12 test-2400-rtr-5',
 '13 27-4c-da-p-13',
 '14 27-4c-da-p-14',
 '15 test11-1500-rtr']

Python demo

Upvotes: 1

Related Questions