Reputation: 89
I am trying to grab the data from, https://www.espn.com/nhl/standings
When I try to grab it, it is putting Florida Panthers one row to high and messing up the data. All the team names need to be shifted down a row. I have tried to mutate the data and tried,
dataset_one = dataset_one.shift(1)
and then joining with the stats table but I am getting NaN.
The docs seem to show a lot of ways of joining and merging data with similar columns headers but not sure the best solution here without a similar column header to join with.
Code:
import pandas as pd
page = pd.read_html('https://www.espn.com/nhl/standings')
dataset_one = page[0] # Team Names
dataset_two = page[1] # Stats
combined_data = dataset_one.join(dataset_two)
print(combined_data)
Output:
FLAFlorida Panthers GP W L OTL ... GF GA DIFF L10 STRK
0 CBJColumbus Blue Jackets 6 5 0 1 ... 22 16 6 5-0-1 W2
1 CARCarolina Hurricanes 10 4 3 3 ... 24 28 -4 4-3-3 L1
2 DALDallas Stars 6 5 1 0 ... 18 10 8 5-1-0 W4
3 TBTampa Bay Lightning 6 4 1 1 ... 23 14 9 4-1-1 L2
4 CHIChicago Blackhawks 6 4 1 1 ... 19 14 5 4-1-1 W1
5 NSHNashville Predators 10 3 4 3 ... 26 31 -5 3-4-3 W1
6 DETDetroit Red Wings 8 4 4 0 ... 20 24 -4 4-4-0 L1
Desired:
GP W L OTL ... GF GA DIFF L10 STRK
0 FLAFlorida Panthers 6 5 0 1 ... 22 16 6 5-0-1 W2
1 CBJColumbus Blue Jackets 10 4 3 3 ... 24 28 -4 4-3-3 L1
2 CARCarolina Hurricanes 6 5 1 0 ... 18 10 8 5-1-0 W4
3 DALDallas Stars 6 4 1 1 ... 23 14 9 4-1-1 L2
4 TBTampa Bay Lightning 6 4 1 1 ... 19 14 5 4-1-1 W1
5 CHIChicago Blackhawks 10 3 4 3 ... 26 31 -5 3-4-3 W1
6 NSHNashville Predators 8 4 4 0 ... 20 24 -4 4-4-0 L1
7 DETDetriot Red Wings 10 2 6 2 6 ... 20 35 -15 2-6-2 L6
Upvotes: 1
Views: 157
Reputation: 825
Providing an alternative approach to @Noah's answer. You can first add an extra row, shift
the df down by a row and then assign the header col as index 0 value.
import pandas as pd
page = pd.read_html('https://www.espn.com/nhl/standings')
dataset_one = page[0] # Team Names
dataset_two = page[1] # Stats
# Shifting down by one row
dataset_one.loc[max(dataset_one.index) + 1, :] = None
dataset_one = dataset_one.shift(1)
dataset_one.iloc[0] = dataset_one.columns
dataset_one.columns = ['team']
combined_data = dataset_one.join(dataset_two)
Upvotes: 1
Reputation: 2776
Just create the df slightly differently so it knows what is the proper header
dataset_one = pd.DataFrame(page[0], columns=["Team Name"])
Then when you join
it should be aligned properly.
Another alternative is to do the following:
dataset_one = page[0].to_frame(name='Team Name')
Upvotes: 0