Jason Brown
Jason Brown

Reputation: 374

pandas groupby aggregation results from another dataframe

I'm having trouble using groupby and aggregation when working w/ multiple dataframes. I'm trying to calculate num_maint_over_$90 from two different dataframes.

cars_dict = {"ABC123": ["Ford", "Compact_Car"], "XYZ148": ["Chevy", "Truck"], "ASX133": ["Ford", "Truck"], "ADS111": ["Porsche", "Sports_Car"], "SSC119": ["Toyota", "Compact_Car"]}
cars = pd.DataFrame.from_dict(cars_dict, orient = 'index')
cars.columns = ["Manufacturer", "Type"]
cars.index.rename("License_Plate", inplace = True)

maintenance_dict = {"License_Plate": ["ABC123", "ABC123", "ABC123", "XYZ148", "ASX133", "ASX133", "ADS111", "ADS111", "SSC119"], "Cost": [60, 100, 200, 150, 40, 199, 33, 99, 0]}
maintenance_records = pd.DataFrame.from_dict(maintenance_dict)
maintenance_records.index.rename("order_num", inplace = True)

* cars: *

              Manufacturer         Type
License_Plate
XYZ148               Chevy        Truck
SSC119              Toyota  Compact_Car
ASX133                Ford        Truck
ABC123                Ford  Compact_Car
ADS111             Porsche   Sports_Car

* maintenance_records: *

           Cost License_Plate
order_num
0            60        ABC123
1           100        ABC123
2           200        ABC123
3           150        XYZ148
4            40        ASX133
5           199        ASX133
6            33        ADS111
7            99        ADS111
8             0        SSC119

*desired df: *

Type         num_maint_over_$90
Compact_Car  2
Sports_Car   1
Truck        2

I tried using groupby, apply(), and for loop.

Upvotes: 1

Views: 4138

Answers (2)

Alicia Garcia-Raboso
Alicia Garcia-Raboso

Reputation: 13913

merged = pd.merge(maintenance_records, cars, how='left',
                  left_on='License_Plate', right_index=True)
merged.query('Cost > 90')['Type'].value_counts()

Upvotes: 2

Jason Brown
Jason Brown

Reputation: 374

Here is the naive for-loop solution:

car_types = {}

for index, row in cars.iterrows():
    car_type = row["Type"]

    if car_type not in car_types:
        car_types[car_type] = 0

for index, row in maintenance_records.iterrows():

    if row["Cost"] > 90:
        car_license = row["License_Plate"]
        car_type = cars.loc[car_license,"Type"]
        car_types[car_type] += 1

df = pd.DataFrame.from_dict(car_types, orient = "index")
df.index.rename("Type", inplace = True)
df.columns = ["num_maint_over_$90"]

* df: *

             num_maint_over_$90
Type
Sports_Car                    1
Compact_Car                   2
Truck                         2

Upvotes: 0

Related Questions