BenjaminFranklinGates
BenjaminFranklinGates

Reputation: 115

Pythonic Way to Split A String Based On SubString

I'm working with college basketball data. The two fields I have right now are the raw matchup and the predicted winner.

RawMatchup PredictedWinner
MinnesotaLouisville Louisville

I want to use the Predicted Winner to separate out the two teams in the RawMatchup column. Currently I'm using replace to remove the Predicted Winner from the RawMatchup.

RawMatchup.replace(PredictedWinner, '')
>>Minnesota

This works for the vast majority of the rows in my dataset. The problem I'm having is when both school's partially share a name

RawMatchup PredictedWinner
GeorgiaGeorgia Tech Georgia
North Carolina CentralNorth Carolina North Carolina

Using split for these two rows returns just 'Tech' and 'Central' (instead 'Georgia Tech' and 'North Carolina Central'). How can I best separate the Predicted Winner from the Raw Matchup while preserving the correct school names?

Upvotes: 1

Views: 49

Answers (1)

Turtlean
Turtlean

Reputation: 579

I wouldn't use split because IMO it's intended for a different purpose (usually splitting the elements by standard separators such as commas, or whitespaces). In this case, what you want is removing PredictedWinner from RawMatchup only once. Therefore I'd go for replace and sub to achieve the goal.

It seems that PredictedWinner is either at the end or at the beginning of RawMatchup. We could take advantage of that to define the following function:

import re

def remove_winner_from_raw(raw_matchup, predicted_winner):
    if (raw_matchup.endswith(predicted_winner)):
        res = re.sub(f"{predicted_winner}$", '', raw_matchup) # regexp
    else:
        res = raw_matchup.replace(predicted_winner, '', 1) # Just the 1st occurrence
    return res

print(remove_winner_from_raw("North Carolina CentralNorth Carolina", "North Carolina"))
# Output: North Carolina Central

print(remove_winner_from_raw("GeorgiaGeorgia Tech", "Georgia"))
# Output: Georgia Tech

Docs for:

Upvotes: 1

Related Questions