Reputation: 99
In this database I have two columns, one is the product ID associated with the sale, and the other is the quantity sold of that item during that same sale. I am trying to figure out how to get a full tally of each item. There are thousands of sales, so many product IDs are repeated in the ID column.
I am not sure how to approach this to find a solution.
Any help would be greatly appreciated.
Example of the columns:
PRODUCT_ID SLS_QTY
0 1164203101 2
1 72047351000 1
2 3600025824 1
3 7205861079 1
4 82775501058 1
Upvotes: 1
Views: 212
Reputation: 218
First get all unique product ids by doing
all_product_ids = df['PRODUCT_ID'].unique()
This will return a numpy array containing all unique product ids. Next, for each product id you want to return each data instance (row) that has that product id. Then we can store the information in a dictionary.
sales_dict = {}
for product in all_product_ids:
info = df.loc[df['PRODUCT_ID'] == product]
total_sales = sum(info['SLS_QTY'].values)
sales_dict[product] = total_sales
This might not be the most efficient way to do it, but it should get the job done.
Upvotes: 0
Reputation: 323226
You can do it with groupby
then merge
sls = sls.groupby('PRODUCT_ID',as_index=False).SLS_QTY.sum()
Totally = sls.merge(price, on = 'PRODUCT_ID', how = 'left')
Totally['sales']=Totally['SLS_QTY']*Totally['price']
Upvotes: 1