Brian
Brian

Reputation: 63

Adding a new column to pandas dataframe based on value of existing column

I have a pandas dataframe like this:

Index                   Resource
2020-07-15 11:59:02     Monkey
2020-07-16 11:59:02     Helicopter
2020-07-17 11:59:02     Forklift
2020-07-18 11:59:02     Airplane
2020-07-19 11:59:02     Dinosaur
2020-07-20 11:59:02     Drone
2020-07-20 11:59:02     Truck
2020-07-20 11:59:02     Airplane
2020-07-22 11:59:02     Truck
2020-07-22 11:59:02     Transport
2020-07-23 11:59:02     Dozer
2020-07-24 11:59:02     Patrol
2020-07-25 11:59:02     Dinosaur

And I want to add a new column named 'Category' like this:

Index                   Resource      Category
2020-07-15 11:59:02     Monkey        Other
2020-07-16 11:59:02     Helicopter    Aviation
2020-07-17 11:59:02     Forklift      Equipment
2020-07-18 11:59:02     Airplane      Aviation
2020-07-19 11:59:02     Dinosaur      Other
2020-07-20 11:59:02     Drone         Aviation
2020-07-20 11:59:02     Truck         Equipment
2020-07-20 11:59:02     Airplane      Aviation
2020-07-22 11:59:02     Truck         Equipment
2020-07-22 11:59:02     Transport     Crew
2020-07-23 11:59:02     Dozer         Equipment
2020-07-24 11:59:02     Patrol        Crew
2020-07-25 11:59:02     Dinosaur      Other

...possibly based upon whether the value of 'Resource' is found in the following lists or not:

aviation_list = ['Airplane', 'Helicopter', 'Drone', 'Parachute']
equipment_list = ['Truck', 'Dozer', 'Forklift', 'Excavator']
crew_list = ['Transport', 'Patrol', 'Stationary']

So the value of the new column 'Category' will default to 'Other' if the value of 'Resource' isn't found in the defined lists; otherwise 'Category' gets 'Aviation', 'Equipment', or 'Crew' respectively. (Each 'Resource' belongs to only one 'Category'.)

I'm sure there must be an elegant way to do this in pandas. Can anyone offer advice?

Upvotes: 2

Views: 84

Answers (3)

Yash Gupta
Yash Gupta

Reputation: 349

You can create a dictionary of lists

d = {}
d['Aviation'] = ['Airplane', 'Helicopter', 'Drone', 'Parachute']
d['Equipment'] = ['Truck', 'Dozer', 'Forklift', 'Excavator']
d['Crew'] = ['Transport', 'Patrol', 'Stationary']

Create a function that accepts a value and return the category.

def final_pop(resource):
   if resource in d['Aviation']:
      return "Aviation"
   elif resource in d['Equipment']:
      return "Equipment"
   elif resource in d['Crew']:
      return "Crew"
   else:
      return "Others"

df['Category'] = df.apply(lambda row: final_pop(row['Resource']),axis=1)

Upvotes: 0

ALollz
ALollz

Reputation: 59579

Use map to create the category values and .fillna to deal with anything not in any list. First we need to create the dictionary:

d = {resource: category 
     for category, lst in zip(['Aviation', 'Equipment', 'Crew'], [aviation_list, equipment_list, crew_list])
     for resource in lst}

df['Category'] = df['Resource'].map(d).fillna('Other')

                       Resource   Category
Index                                     
2020-07-15 11:59:02      Monkey      Other
2020-07-16 11:59:02  Helicopter   Aviation
2020-07-17 11:59:02    Forklift  Equipment
2020-07-18 11:59:02    Airplane   Aviation
2020-07-19 11:59:02    Dinosaur      Other
2020-07-20 11:59:02       Drone   Aviation
2020-07-20 11:59:02       Truck  Equipment
2020-07-20 11:59:02    Airplane   Aviation
2020-07-22 11:59:02       Truck  Equipment
2020-07-22 11:59:02   Transport       Crew
2020-07-23 11:59:02       Dozer  Equipment
2020-07-24 11:59:02      Patrol       Crew
2020-07-25 11:59:02    Dinosaur      Other

Upvotes: 2

sedavidw
sedavidw

Reputation: 11741

You can create a function that takes a Resource value and gives a Category

def get_category(resource):
        aviation_list = set(['Airplane', 'Helicopter', 'Drone', 'Parachute'])
        equipment_list = set(['Truck', 'Dozer', 'Forklift', 'Excavator'])
        crew_list = set(['Transport', 'Patrol', 'Stationary'])
        if resource in aviation_list:
            return 'Aviation'
        elif resource in equipment_list:
            return 'Equipment'
        elif resource in crew_list:
            return 'Crew'
        else:
            return 'Other'

Then you can create your new column with the following

# load your data
import pandas as pd
df = pd.read_clipboard() # copied from above

df['Category'] = [get_category(resource) for resource in df['Resource']]

This yields

In [9]: df
Out[9]:
               Index    Resource   Category
2020-07-15  11:59:02      Monkey      Other
2020-07-16  11:59:02  Helicopter   Aviation
2020-07-17  11:59:02    Forklift  Equipment
2020-07-18  11:59:02    Airplane   Aviation
2020-07-19  11:59:02    Dinosaur      Other
2020-07-20  11:59:02       Drone   Aviation
2020-07-20  11:59:02       Truck  Equipment
2020-07-20  11:59:02    Airplane   Aviation
2020-07-22  11:59:02       Truck  Equipment
2020-07-22  11:59:02   Transport       Crew
2020-07-23  11:59:02       Dozer  Equipment
2020-07-24  11:59:02      Patrol       Crew
2020-07-25  11:59:02    Dinosaur      Other

Quick note... I made an assumption that each Resource could belong to only one category, so I just take the first matching value I find

Upvotes: 0

Related Questions