Reputation: 63
I have a pandas dataframe like this:
Index Resource
2020-07-15 11:59:02 Monkey
2020-07-16 11:59:02 Helicopter
2020-07-17 11:59:02 Forklift
2020-07-18 11:59:02 Airplane
2020-07-19 11:59:02 Dinosaur
2020-07-20 11:59:02 Drone
2020-07-20 11:59:02 Truck
2020-07-20 11:59:02 Airplane
2020-07-22 11:59:02 Truck
2020-07-22 11:59:02 Transport
2020-07-23 11:59:02 Dozer
2020-07-24 11:59:02 Patrol
2020-07-25 11:59:02 Dinosaur
And I want to add a new column named 'Category' like this:
Index Resource Category
2020-07-15 11:59:02 Monkey Other
2020-07-16 11:59:02 Helicopter Aviation
2020-07-17 11:59:02 Forklift Equipment
2020-07-18 11:59:02 Airplane Aviation
2020-07-19 11:59:02 Dinosaur Other
2020-07-20 11:59:02 Drone Aviation
2020-07-20 11:59:02 Truck Equipment
2020-07-20 11:59:02 Airplane Aviation
2020-07-22 11:59:02 Truck Equipment
2020-07-22 11:59:02 Transport Crew
2020-07-23 11:59:02 Dozer Equipment
2020-07-24 11:59:02 Patrol Crew
2020-07-25 11:59:02 Dinosaur Other
...possibly based upon whether the value of 'Resource' is found in the following lists or not:
aviation_list = ['Airplane', 'Helicopter', 'Drone', 'Parachute']
equipment_list = ['Truck', 'Dozer', 'Forklift', 'Excavator']
crew_list = ['Transport', 'Patrol', 'Stationary']
So the value of the new column 'Category' will default to 'Other' if the value of 'Resource' isn't found in the defined lists; otherwise 'Category' gets 'Aviation', 'Equipment', or 'Crew' respectively. (Each 'Resource' belongs to only one 'Category'.)
I'm sure there must be an elegant way to do this in pandas. Can anyone offer advice?
Upvotes: 2
Views: 84
Reputation: 349
You can create a dictionary of lists
d = {}
d['Aviation'] = ['Airplane', 'Helicopter', 'Drone', 'Parachute']
d['Equipment'] = ['Truck', 'Dozer', 'Forklift', 'Excavator']
d['Crew'] = ['Transport', 'Patrol', 'Stationary']
Create a function that accepts a value and return the category.
def final_pop(resource):
if resource in d['Aviation']:
return "Aviation"
elif resource in d['Equipment']:
return "Equipment"
elif resource in d['Crew']:
return "Crew"
else:
return "Others"
df['Category'] = df.apply(lambda row: final_pop(row['Resource']),axis=1)
Upvotes: 0
Reputation: 59579
Use map
to create the category values and .fillna
to deal with anything not in any list. First we need to create the dictionary:
d = {resource: category
for category, lst in zip(['Aviation', 'Equipment', 'Crew'], [aviation_list, equipment_list, crew_list])
for resource in lst}
df['Category'] = df['Resource'].map(d).fillna('Other')
Resource Category
Index
2020-07-15 11:59:02 Monkey Other
2020-07-16 11:59:02 Helicopter Aviation
2020-07-17 11:59:02 Forklift Equipment
2020-07-18 11:59:02 Airplane Aviation
2020-07-19 11:59:02 Dinosaur Other
2020-07-20 11:59:02 Drone Aviation
2020-07-20 11:59:02 Truck Equipment
2020-07-20 11:59:02 Airplane Aviation
2020-07-22 11:59:02 Truck Equipment
2020-07-22 11:59:02 Transport Crew
2020-07-23 11:59:02 Dozer Equipment
2020-07-24 11:59:02 Patrol Crew
2020-07-25 11:59:02 Dinosaur Other
Upvotes: 2
Reputation: 11741
You can create a function that takes a Resource
value and gives a Category
def get_category(resource):
aviation_list = set(['Airplane', 'Helicopter', 'Drone', 'Parachute'])
equipment_list = set(['Truck', 'Dozer', 'Forklift', 'Excavator'])
crew_list = set(['Transport', 'Patrol', 'Stationary'])
if resource in aviation_list:
return 'Aviation'
elif resource in equipment_list:
return 'Equipment'
elif resource in crew_list:
return 'Crew'
else:
return 'Other'
Then you can create your new column with the following
# load your data
import pandas as pd
df = pd.read_clipboard() # copied from above
df['Category'] = [get_category(resource) for resource in df['Resource']]
This yields
In [9]: df
Out[9]:
Index Resource Category
2020-07-15 11:59:02 Monkey Other
2020-07-16 11:59:02 Helicopter Aviation
2020-07-17 11:59:02 Forklift Equipment
2020-07-18 11:59:02 Airplane Aviation
2020-07-19 11:59:02 Dinosaur Other
2020-07-20 11:59:02 Drone Aviation
2020-07-20 11:59:02 Truck Equipment
2020-07-20 11:59:02 Airplane Aviation
2020-07-22 11:59:02 Truck Equipment
2020-07-22 11:59:02 Transport Crew
2020-07-23 11:59:02 Dozer Equipment
2020-07-24 11:59:02 Patrol Crew
2020-07-25 11:59:02 Dinosaur Other
Quick note... I made an assumption that each Resource
could belong to only one category, so I just take the first matching value I find
Upvotes: 0