Reputation: 3852
I have a string:
s = 'Abc - 33 SR 11 Kill(s) P G - (Type-1P-G) 2 Kill(s) M 1 Kill(s) S - M9A CWS 1 Kill(s) 11 Kill(s)'
I'm trying to split this up to capture the number of kills, and the information before each "XY Kill(s)"
to get this output:
['Abc - 33 SR',
'P G - (Type-1P-G)',
'M',
'S - M9A CWS']
Getting the number of kills was simple:
re.findall(r"(\d+) Kill", s)
['11', '2', '1', '1', '11']
Getting the text has been harder. From researching, I have tried to use the following regex, which just gave the beginning of a series of capture groups:
re.findall(r"(?=[0-9]+ Kill)", s)
['', '', '', '', '', '', '']
I then changed this to add in "any number of characters before each group".
re.findall(r"(.+)(?=[0-9]+ Kill)", s)
['Abc - 33 SR 11 Kill(s) P G - (Type-1P-G) 2 Kill(s) M 1 Kill(s) S - M9A CWS 1 Kill(s) 1']
This just gives the entire string. How can I adjust this to capture everything before "any number of digits-space-Kill"?
Let's get the dupes out of the way. I've consulted the following. The second in particular looked useful but I've been unable to make it suit this purpose.
Extract Number before a Character in a String Using Python,
How would I get everything before a : in a string Python,
how to get the last part of a string before a certain character?.
Upvotes: 1
Views: 2029
Reputation: 23064
You can use re.split()
to get a list of all content between matches.
>>> re.split(r"\d+ Kill\(s\)", s)
['Abc - 33 SR ', ' P G - (Type-1P-G) ', ' M ', ' S - M9A CWS ', ' ', '']
You can clean it up to remove whitespace and empty strings.
>>> [s.strip() for s in re.split(r"\d+ Kill\(s\)", s) if s.strip()]
['Abc - 33 SR', 'P G - (Type-1P-G)', 'M', 'S - M9A CWS']
Upvotes: 1
Reputation: 626738
You may use
re.findall(r'(.*?)\s*(\d+) Kill\(s\)\s*', s)
See the regex demo
Details
(.*?)
- Capturing group 1: any 0+ chars other than line break chars, as few as possible\s*
- 0+ whitespaces(\d+)
- Capturing group 2: one or more digits Kill(s)
- a space and Kill(s)
substring\s*
- 0+ whitespacesimport re
rx = r"(.*?)\s*(\d+) Kill\(s\)\s*"
s = "Abc - 33 SR 11 Kill(s) P G - (Type-1P-G) 2 Kill(s) M 1 Kill(s) S - M9A CWS 1 Kill(s) 11 Kill(s)"
print(re.findall(rx, s))
Output:
[('Abc - 33 SR', '11'), ('P G - (Type-1P-G)', '2'), ('M', '1'), ('S - M9A CWS', '1'), ('', '11')]
Upvotes: 1