Reputation: 145
I scraped table data from https://www.actionnetwork.com/ncaaf/public-betting but it was a jumbled mess in one column, so I'm trying to clean it up after exporting to Excel:
I want to add a column Matchups
and split out the two team names from the Scheduled
column into their own rows if possible.
If I only split by 'PM' I get the 1st desired output on some, but I still have cells with the AM games, and I would like to split by that as well.
At that point, I'm not sure how to split to return the final desired output. Any ideas?
Before:
1st Desired Output:
Final Desired Output:
Thanks to all who take a look.
import pandas as pd
df = pd.read_excel('ActionNetwork.xlsx')
df['Matchups']= df['Scheduled'].apply(lambda x: x.split('PM | AM')[-1])
print(df)
Upvotes: 1
Views: 340
Reputation: 41327
Since you want to put the matchups back into the same df
, the teams need to go into separate columns (otherwise it will throw an error about uneven row counts).
Instead of multiple splits, use a single str.extract
with the following patterns:
.*[AM|PM]
-- match everything up to AM
or PM
(but don't capture it)([0-9]+[a-zA-Z ]+)
-- capture 1+ numbers and 1+ letters/spaces (away team)([0-9]+[a-zA-Z ]+)
-- capture 1+ numbers and 1+ letters/spaces (home team)pattern = r'.*[AM|PM] ([0-9]+[a-zA-Z ]+)([0-9]+[a-zA-Z ]+)'
df[['Away', 'Home']] = df['Scheduled'].str.extract(pattern)
# Scheduled Away Home
# 0 Fri 10/29, 11:30 PM 113NavyNAVY114TulsaTLSA 113NavyNAVY 114TulsaTLSA
# 1 Sat 10/30, 2:00 AM 114UNLVUNLV116NevadaNEV 114UNLVUNLV 116NevadaNEV
# 2 Sat 10/30, 11:00 PM 110Ole MissMISS118AuburnAUB 110Ole MissMISS 118AuburnAUB
Upvotes: 1