Pull Column Value based on multiple list conditions

Question

I have created a table for cricket player Joe Root and his batting statistics for every cricket ground he has played in his international career.

I have used the following script in google colab...

import pandas as pd
import numpy as np

root_grounds = pd.read_html('https://stats.espncricinfo.com/ci/engine/player/303669.html?class=1;template=results;type=batting;view=ground')

root_grounds = root_grounds[3]

root_grounds['Ground'] = root_grounds['Ground'].shift()

root_grounds = root_grounds[root_grounds['Ground'].notnull()]
root_grounds[['Mat', 'Inns', 'NO', 'Runs', 'BF', 'SR', '100', '50', '0', '4s', '6s']] = root_grounds[['Mat', 'Inns', 'NO', 'Runs', 'BF', 'SR', '100', '50', '0', '4s', '6s']].apply(pd.to_numeric)
root_grounds['Ave'] = root_grounds['Runs']/(root_grounds['Inns']-root_grounds['NO'])

root_grounds['Country'] = root_grounds['Ground'].str.split('-').str[-1]

Asia = ['India', 'Sri Lanka', 'U.A.E.', 'Pakistan', 'Bangladesh']
Oceania = ['New Zealand', 'Australia']
Caribbean = ['West Indies']
Europe = ['England']
Africa = ['South Africa']

conditions = [
    root_grounds['Country'].isin(Asia),
    root_grounds['Country'].isin(Oceania),
    root_grounds['Country'].isin(Caribbean),
    root_grounds['Country'].isin(Europe),
    root_grounds['Country'].isin(Africa),
]

choices = ['Asia', 'Oceania', 'Caribbean', 'Europe', 'Africa']

root_grounds['Continent'] = np.select(conditions, choices)

root_grounds.sort_values('Ave', ascending = False).reset_index(drop=True).head(10)

...to generate the following output so far:

However, I am having trouble generating the values I want from the "Continents" column. For example, I want any country column value that contains 'New Zealand' to equal 'Oceania' under Continent and so on based on the lists created in the script. Is there a way to do it this way or is there a better method to generate the result? Any assistance is truly appreciated. Thanks in advance.

Leo · Accepted Answer

It sounds like you want to map each country name to a continent and store that result in root_grounds["Continent"]. One way of doing this would be to create a dictionary that holds this mapping and then using pandas.Series.map().

If you create a dictionary that looks like this (but with all your countries):

country_to_continent = { "India": "Asia", "New Zealand": "Oceania",
                         "England": "Europe" }

then you can create the continent column like this:

root_grounds["Continent"] = root_grounds["Country"].map( country_to_continent )

Two notes:

If there is a country in root_grounds["Country"] that is not in the dictionary, the continent column will contain a null value there.
If your data is not clean and you truly need to do the above but for country values where, for example, the name contains "New Zealand" but is not exactly equal to it, then you would first need to do some additional cleaning.

This additional cleaning could use the same dictionary though (a quick but maybe inefficient way of doing this below):

root_grounds["clean_country"] = ""
for country in country_to_continent.keys():
    root_grounds["clean_country"] = np.where( root_grounds["Country"].str.contains( country ), country, root_grounds["clean_country"] )

You would then call the map function on root_grounds["clean_country"] instead of root_grounds["Country"].

Pull Column Value based on multiple list conditions

Answers (1)

Related Questions