Finding and extracting multiple substrings in a string?

Question

After looking a few similar questions, I have not been able to successfully implement a substring split on my data. For my specific case, I have a bunch of strings, and each string has a substring I need to extract. The strings are grouped together in a list and my data is NBA positions. I need to pull out the positions (either 'PG', 'SG', 'SF', 'PF', or 'C') from each string. Some strings will have more than one position. Here is the data.

text = ['Chi\xa0SG, SF\xa0\xa0DTD','Cle\xa0PF']

The code should ideally look at the first string, 'Chi\xa0SG, SF\xa0\xa0DTD', and return ['SG','SF'] the two positions. The code should look at the second string and return ['PF'].

heemayl · Accepted Answer

Leverage (zero width) lookarounds:

(?




(? is zero width negative lookbehind pattern, making sure the desired match is not preceded by any alphanumerics

PG|SG|SF|PF|C matches any of the desired patterns
(?!\w) is zero width negative lookahead pattern making sure the match is not followed by any alphanumerics


Example:

In [7]: s = 'Chi\xa0SG, SF\xa0\xa0DTD'

In [8]: re.findall(r'(?

Finding and extracting multiple substrings in a string?

Answers (2)

Related Questions