LearnToGrow
LearnToGrow

Reputation: 1768

Regex to split text with hyphen points

Suppose that we have the following string:

'We need the list fo the following products: - Abcd efgh ejklm - Efgh-ij sklrm, defasad - KLMNNOP/QRS dasfdssa eadsd'

I want a regex that return:

- Abcd efgh, ejklm
- Efgh-ij sklrm, defasad
- KLMNNOP/QRS dasfdssa eadsd

I write this one that works correctly but it cuts if we have a composed word.

import re
regx = '-\s[\w\s\/?,;!:#&@]*' # start with hyphen + space + mix of different characters
z = re.findall(regx, 'We need the list fo the following products: - Abcd - Efgh-ij - KLMNNOP/QRS')
for p in z:
    print(p)

- Abcd efgh, ejklm 
- Efgh
- KLMNNOP/QRS dasfdssa eadsd

Upvotes: 0

Views: 124

Answers (1)

The fourth bird
The fourth bird

Reputation: 163447

You could repeat matching either the current character class, or only a hyphen followed by word characters

-\s(?:[\w\s/?,;!:#&@]+|-\w+)+

See a regex demo and a Python demo.

If you don't want to match empty parts, you can change the quantifier for the character class to + to match 1 or more times.

Example

import re
regx = '-\s(?:[\w\s/?,;!:#&@]+|-\w+)+'
z = re.findall(regx, 'We need the list fo the following products: - Abcd efgh ejklm - Efgh-ij sklrm, defasad - KLMNNOP/QRS dasfdssa eadsd')
for p in z:
    print(p)

Output

- Abcd efgh ejklm 
- Efgh-ij sklrm, defasad 
- KLMNNOP/QRS dasfdssa eadsd

Or a bit broader match instead of only word characters:

-\s(?:[\w\s/?,;!:#&@]+|-[\w/?,;!:#&@]+)+

Upvotes: 1

Related Questions