RookiePython
RookiePython

Reputation: 145

How do I Split a Cell With Multiple Delimiters and/or Strings?

I scraped table data from https://www.actionnetwork.com/ncaaf/public-betting but it was a jumbled mess in one column, so I'm trying to clean it up after exporting to Excel:

I want to add a column Matchups and split out the two team names from the Scheduled column into their own rows if possible.

If I only split by 'PM' I get the 1st desired output on some, but I still have cells with the AM games, and I would like to split by that as well.

At that point, I'm not sure how to split to return the final desired output. Any ideas?

Before:

enter image description here

1st Desired Output:

enter image description here

Final Desired Output:

enter image description here

Thanks to all who take a look.

import pandas as pd

df = pd.read_excel('ActionNetwork.xlsx')
df['Matchups']= df['Scheduled'].apply(lambda x: x.split('PM | AM')[-1])
print(df)

Upvotes: 1

Views: 340

Answers (1)

tdy
tdy

Reputation: 41327

Since you want to put the matchups back into the same df, the teams need to go into separate columns (otherwise it will throw an error about uneven row counts).

Instead of multiple splits, use a single str.extract with the following patterns:

  • .*[AM|PM] -- match everything up to AM or PM (but don't capture it)
  • ([0-9]+[a-zA-Z ]+) -- capture 1+ numbers and 1+ letters/spaces (away team)
  • ([0-9]+[a-zA-Z ]+) -- capture 1+ numbers and 1+ letters/spaces (home team)
pattern = r'.*[AM|PM] ([0-9]+[a-zA-Z ]+)([0-9]+[a-zA-Z ]+)'
df[['Away', 'Home']] = df['Scheduled'].str.extract(pattern)

#                                          Scheduled             Away          Home
# 0      Fri 10/29, 11:30 PM 113NavyNAVY114TulsaTLSA      113NavyNAVY  114TulsaTLSA
# 1       Sat 10/30, 2:00 AM 114UNLVUNLV116NevadaNEV      114UNLVUNLV  116NevadaNEV
# 2  Sat 10/30, 11:00 PM 110Ole MissMISS118AuburnAUB  110Ole MissMISS  118AuburnAUB

Upvotes: 1

Related Questions