Reputation: 247
I have created a table for cricket player Joe Root and his batting statistics for every cricket ground he has played in his international career.
I have used the following script in google colab...
import pandas as pd
import numpy as np
root_grounds = pd.read_html('https://stats.espncricinfo.com/ci/engine/player/303669.html?class=1;template=results;type=batting;view=ground')
root_grounds = root_grounds[3]
root_grounds['Ground'] = root_grounds['Ground'].shift()
root_grounds = root_grounds[root_grounds['Ground'].notnull()]
root_grounds[['Mat', 'Inns', 'NO', 'Runs', 'BF', 'SR', '100', '50', '0', '4s', '6s']] = root_grounds[['Mat', 'Inns', 'NO', 'Runs', 'BF', 'SR', '100', '50', '0', '4s', '6s']].apply(pd.to_numeric)
root_grounds['Ave'] = root_grounds['Runs']/(root_grounds['Inns']-root_grounds['NO'])
root_grounds['Country'] = root_grounds['Ground'].str.split('-').str[-1]
Asia = ['India', 'Sri Lanka', 'U.A.E.', 'Pakistan', 'Bangladesh']
Oceania = ['New Zealand', 'Australia']
Caribbean = ['West Indies']
Europe = ['England']
Africa = ['South Africa']
conditions = [
root_grounds['Country'].isin(Asia),
root_grounds['Country'].isin(Oceania),
root_grounds['Country'].isin(Caribbean),
root_grounds['Country'].isin(Europe),
root_grounds['Country'].isin(Africa),
]
choices = ['Asia', 'Oceania', 'Caribbean', 'Europe', 'Africa']
root_grounds['Continent'] = np.select(conditions, choices)
root_grounds.sort_values('Ave', ascending = False).reset_index(drop=True).head(10)
...to generate the following output so far:
However, I am having trouble generating the values I want from the "Continents" column. For example, I want any country column value that contains 'New Zealand' to equal 'Oceania' under Continent and so on based on the lists created in the script. Is there a way to do it this way or is there a better method to generate the result? Any assistance is truly appreciated. Thanks in advance.
Upvotes: 0
Views: 34
Reputation: 466
It sounds like you want to map each country name to a continent and store that result in root_grounds["Continent"]
. One way of doing this would be to create a dictionary that holds this mapping and then using pandas.Series.map().
If you create a dictionary that looks like this (but with all your countries):
country_to_continent = { "India": "Asia", "New Zealand": "Oceania",
"England": "Europe" }
then you can create the continent column like this:
root_grounds["Continent"] = root_grounds["Country"].map( country_to_continent )
Two notes:
root_grounds["Country"]
that is not in the dictionary, the continent column will contain a null value there.This additional cleaning could use the same dictionary though (a quick but maybe inefficient way of doing this below):
root_grounds["clean_country"] = ""
for country in country_to_continent.keys():
root_grounds["clean_country"] = np.where( root_grounds["Country"].str.contains( country ), country, root_grounds["clean_country"] )
You would then call the map function on root_grounds["clean_country"]
instead of root_grounds["Country"]
.
Upvotes: 1