Reputation: 75
Dataset:
Item_Identifier Item_Weight Item_Fat_Content Item_Visibility \ 0 FDA15 9.30 Low Fat 0.016047 1 DRC01 5.92 Regular 0.019278 2 FDN15 17.50 Low Fat 0.016760 3 FDX07 19.20 Regular 0.065953 4 NCD19 8.93 Low Fat 0.065953 Item_Type Item_MRP Outlet_Identifier \ 0 Dairy 249.8092 OUT049 1 Soft Drinks 48.2692 OUT018 2 Meat 141.6180 OUT049 3 Fruits and Vegetables 182.0950 OUT010 4 Household 53.8614 OUT013 Outlet_Establishment_Year Outlet_Size Outlet_Location_Type \ 0 1999 Medium Tier 1 1 2009 Medium Tier 3 2 1999 Medium Tier 1 3 1998 Medium Tier 3 4 1987 High Tier 3 Outlet_Type Item_Type_new 0 Supermarket Type1 perishable 1 Supermarket Type2 non-perishable 2 Supermarket Type1 perishable 3 Grocery Store perishable 4 Supermarket Type1 non-perishable
Pivotal Table:
Index: Item_Type, Columns: Substring of Item Identifiers, Values: counts.
Expected Output:
DR FD NC Baking Goods 0 1086 0 Breads 0 416 0 Breakfast 0 186 0 Canned 0 1084 0 Dairy 229 907 0 Frozen Foods 0 1426 0 Fruits and Vegetables 0 2013 0 Hard Drinks 362 0 0 Health and Hygiene 0 0 858 Household 0 0 1548 Meat 0 736 0 Others 0 0 280 Seafood 0 89 0 Snack Foods 0 1989 0 Soft Drinks 726 0 0 Starchy Foods 0 269 0
Upvotes: 0
Views: 233
Reputation: 1313
Create a new columns which is sub-string of item Item_Identifier. and then create pivot_table based on them.
Here is the code. (assuming the df is the dataframe with dataset)
df['Item_Identifier_substr'] = df['Item_Identifier'].str.left(2)
pivot_df = df.pivot_table(index = 'Item_Type', columns = 'Item_Identifier_substr', values='Item_Identifier', aggfunc='count')
pivot_df
If you like it, pls vote my answer.
Upvotes: 1