danielvaldes
danielvaldes

Reputation: 45

Check if element is in list, then write to new column in Pandas dataframe if conditions met

Looking at a pandas dataframe containing information on all olympic athletes for past 150 years (Name, Weight, Country, Sport, etc). Available at https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results#athlete_events.csv.

Preview of dataframe

Attempting to make a for loop that iterates through df rows, checks the value stored in the 'Sport' column against several lists and then adds a column to the df with a parent category within the same row. Code so far:

aquatic_sports = ['Swimming','Diving','Synchronized Swimming','Water Polo']
track_sports = ['Athletics','Modern Pentathlon','Triathlon','Biathlon','Cycling']
team_sports = ['Softball','Basketball','Volleyball','Beach Volleyball','Handball','Rugby','Lacrosse']
gymnastic_sports = ['Gymnastics','Rhytmic Gymnastics','Trampolining']
fitness_sports = ['Weightlifting']
combat_sports = ['Boxing','Judo','Wrestling','Taekwondo']
winter_sports = ['Short Track Speed Skating','Ski Jumping','Cross Country Skiing','Luge','Bobsleigh','Alpine Skiing','Curling','Snowboarding','Ice Hocky','Hockey','Speed Skating']

for index, row in df.iterrows():

    if df.iloc[0,11] in aquatic_sports:

        df['Sport Category'] = 'Aquatic Sport'

    elif df.iloc[0,11] in track_sports:

        df['Sport Category'] = 'Track Sport'

    elif df.iloc[0,11] in gymnastic_sports:

        df['Sport Category'] = 'Gymnastic Sport'

    elif df.iloc[0,11] in fitness_sports:

        df['Sport Category'] = 'Fitness Sport'

    elif df.iloc[0,11] in combat_sports:

        df['Sport Category'] = 'Combat Sport'

    elif df.iloc[0,11] in winter_sports:

        df['Sport Category'] = 'Winter Sport'

No errors thrown but unfortunately all values in the new column are the same. Unsure how to pass the current index to ensure each iterations returns a unique, correct value.

Upvotes: 1

Views: 1281

Answers (2)

skott
skott

Reputation: 564

The reason for the same values in the column is because whenever you do df['Sport Category'] = <something> you are setting the entire column to that value. In your code, essentially the column gets updated multiple times, but retains the last value being set.

While setting the value, you can try df.ix[0, 'Sport Category'] = <something> to see if the setting works.

Upvotes: 0

ALollz
ALollz

Reputation: 59579

This is a map, though we need to create the appropriate dictionary. Since you've already created the lists in separate variables, we can instead store them in a dictionary, with the label you want as the key:

d = {
    'Aquatic Sport': ['Swimming', 'Diving','Synchronized Swimming', 'Water Polo'],
    'Track Sports': ['Athletics','Modern Pentathlon', 'Triathlon', 'Biathlon', 'Cycling'],
    'Team Sport': ['Softball', 'Basketball', 'Volleyball', 'Beach Volleyball',
                   'Handball', 'Rugby', 'Lacrosse'],
    'Gymnastic Sport': ['Gymnastics', 'Rhytmic Gymnastics', 'Trampolining'],
    'Fitness Sport': ['Weightlifting'],
    'Combat Sport': ['Boxing','Judo', 'Wrestling', 'Taekwondo'],
    'Winter Sport': ['Short Track Speed Skating', 'Ski Jumping', 'Cross Country Skiing',
                     'Luge','Bobsleigh', 'Alpine Skiing', 'Curling', 'Snowboarding',
                     'Ice Hockey', 'Hockey', 'Speed Skating']
    }

# unpacks lists so it's {sport: category_label}
d = {sport: cat for cat,l in d.items() for sport in l}
df['Sport Category'] = df['Sport'].map(d)

Upvotes: 1

Related Questions