Reputation: 87
I have strings that can have a various amount of "groups". I need to split them, but I am having trouble doing so. The groups will always start with [A-Z]{2-5}
followed by a :
and a string or varying length and spaces. It will always have a space in front of the group.
Example strings:
"YellowSky AA:Hello AB:1234 AC:1F 322 AD:hj21jkhjk23"
"Billy Bob Thorton AA:213231 AB:aaaa AC:ddddd 322 AD:hj2ffs dsfdsfd1jkhjk23"
My code thus far:
import re
D = "Test1 AA:Hello AB:1234 AC:1F 322 AD:hj21jkhjk23"
g = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(D)
As you can see... this works for one word starting string, but not multiple words.
Upvotes: 4
Views: 2314
Reputation: 67968
([A-Z]{2,5}:\w+(?: +\w+)*)(?=(?: +[A-Z]+:|$))
You can also use re.findall
directly.
See demo.
https://regex101.com/r/6jf8EM/1
This way you don't need to filter unwanted groups later. You get what you need.
Upvotes: 1
Reputation: 626690
You can use
re.split(r'(?!^)\s+(?=[A-Z]+:)', text)
See this regex demo.
Details:
(?!^)
- a negative lookahead that matches a location not at the start of string (equal to (?<!^)
but one char shorter)\s+
- one or more whitespaces(?=[A-Z]+:)
- a positive lookahead that requires one or more uppercase ASCII letters followed with a :
char immediately to the right of the current location.Upvotes: 2