Reputation: 2100
After looking a few similar questions, I have not been able to successfully implement a substring split on my data. For my specific case, I have a bunch of strings, and each string has a substring I need to extract. The strings are grouped together in a list and my data is NBA positions. I need to pull out the positions (either 'PG', 'SG', 'SF', 'PF', or 'C') from each string. Some strings will have more than one position. Here is the data.
text = ['Chi\xa0SG, SF\xa0\xa0DTD','Cle\xa0PF']
The code should ideally look at the first string, 'Chi\xa0SG, SF\xa0\xa0DTD'
, and return ['SG','SF']
the two positions. The code should look at the second string and return ['PF']
.
Upvotes: 1
Views: 163
Reputation: 125
heemayl's response is the most correct, but you could probably get away with splitting on commas and keeping only the last two (or in the case of 'C', the last) characters in each substring.
s = 'Chi\xa0SG, SF\xa0\xa0DTD'
fin = list(map(lambda x: x[-2:] if x != 'C' else x[-1:],s.split(',')))
I can't test this at the moment as I'm on a chromebook but it should work.
Upvotes: 0
Reputation: 42017
Leverage (zero width) lookarounds:
(?<!\w)PG|SG|SF|PF|C(?!\w)
(?<!\w)
is zero width negative lookbehind pattern, making sure the desired match is not preceded by any alphanumerics
PG|SG|SF|PF|C
matches any of the desired patterns
(?!\w)
is zero width negative lookahead pattern making sure the match is not followed by any alphanumerics
Example:
In [7]: s = 'Chi\xa0SG, SF\xa0\xa0DTD'
In [8]: re.findall(r'(?<!\w)PG|SG|SF|PF|C(?!\w)', s)
Out[8]: ['SG', 'SF']
Upvotes: 2