How do I Split a Cell With Multiple Delimiters and/or Strings?

Question

I scraped table data from https://www.actionnetwork.com/ncaaf/public-betting but it was a jumbled mess in one column, so I'm trying to clean it up after exporting to Excel:

I want to add a column Matchups and split out the two team names from the Scheduled column into their own rows if possible.

If I only split by 'PM' I get the 1st desired output on some, but I still have cells with the AM games, and I would like to split by that as well.

At that point, I'm not sure how to split to return the final desired output. Any ideas?

Before:

1st Desired Output:

Final Desired Output:

Thanks to all who take a look.

import pandas as pd

df = pd.read_excel('ActionNetwork.xlsx')
df['Matchups']= df['Scheduled'].apply(lambda x: x.split('PM | AM')[-1])
print(df)

tdy · Accepted Answer

Since you want to put the matchups back into the same df, the teams need to go into separate columns (otherwise it will throw an error about uneven row counts).

Instead of multiple splits, use a single str.extract with the following patterns:

.*[AM|PM] -- match everything up to AM or PM (but don't capture it)
([0-9]+[a-zA-Z ]+) -- capture 1+ numbers and 1+ letters/spaces (away team)
([0-9]+[a-zA-Z ]+) -- capture 1+ numbers and 1+ letters/spaces (home team)

pattern = r'.*[AM|PM] ([0-9]+[a-zA-Z ]+)([0-9]+[a-zA-Z ]+)'
df[['Away', 'Home']] = df['Scheduled'].str.extract(pattern)

#                                          Scheduled             Away          Home
# 0      Fri 10/29, 11:30 PM 113NavyNAVY114TulsaTLSA      113NavyNAVY  114TulsaTLSA
# 1       Sat 10/30, 2:00 AM 114UNLVUNLV116NevadaNEV      114UNLVUNLV  116NevadaNEV
# 2  Sat 10/30, 11:00 PM 110Ole MissMISS118AuburnAUB  110Ole MissMISS  118AuburnAUB

How do I Split a Cell With Multiple Delimiters and/or Strings?

Answers (1)

Related Questions