Reputation: 449
I have 3 dataframes
already sorted with date and p_id
and with no null
values as:
First DataFrame
df1 = pd.DataFrame([['2018-07-05',8.0,1],
['2018-07-15',1.0,1],
['2018-08-05',2.0,1],
['2018-08-05',2.0,2]],
columns=["purchase_date", "qty", "p_id"])
Second DataFrame
df2 = pd.DataFrame([['2018-07-15',2.0,1],
['2018-08-04',7.0,1],
['2018-08-15',1.0,2]],
columns=["sell_date", "qty", "p_id"])
Third DataFrame
df3 = pd.DataFrame([['2018-07-25',1.0,1],
['2018-08-15',1.0,1]],
columns=["expired_date", "qty", "p_id"])
dataframe
looks like:
1st: (Holds Purchase details)
purchase_date qty p_id
0 2018-07-05 8.0 1
1 2018-07-15 1.0 1
2 2018-08-05 2.0 1
3 2018-08-05 2.0 2
2nd: (Holds Sales Details)
sell_date qty p_id
0 2018-07-15 2.0 1
1 2018-08-04 7.0 1
2 2018-08-15 1.0 2
3rd: (Holds Expiry Details)
expired_date qty p_id
0 2018-07-25 1.0 1
1 2018-08-15 1.0 1
Now What I want to do is find when the product that has expired was bought
following FIFO
(product first purchased will expire first)
By date 2018-07-15
We had 8+1 purchased quantity and -2 sold quantity i.e. total of 8+1-2 quantity in stock , -ve
sign signify quantity deduction
By date 2018-07-25
1 quantity expired so first entry for our new when_product_expired
dataframe
will be:
purchase_date expired_date p_id
2018-07-05 2018-07-25 1
By date 2018-08-04
7 quantity were sold out so current quantity will be 8+1-2-7 = 0
By date 2018-08-05
2 quantity were bought so current quantity is 0+2
By date 2018-08-15
1 quantity expired
So a new and final entry will be:
purchase_date expired_date p_id
2018-07-05 2018-07-25 1
2018-08-05 2018-08-15 1
This time the product expired was one that was purchased on 2018-07-25
Actually I have date time, so purchase and sell time will never be equal (you may assume), also before selling and expire, there will always be some quantity of product in stock, i.e. data is consistent
And Thank you in advance :-)
What by now I am thinking is rename all date fields to same field name and append purchase, sell, expired dataframe
with negative sign, but that won't help me
df2.qty = df2.qty*-1
df3.qty=df3.qty*-1
new = pd.concat([df1,df2, df3],sort=False)
.sort_values(by=["purchase_date"],ascending=True)
.reset_index(drop=True)
Upvotes: 0
Views: 181
Reputation: 2887
What you essentially want is this FIFO list of items in stock. In my experience pandas is not the right tool to relate different rows to each other. The workflow should be split-apply-combine. If you split it and don't really see a way how to puzzle it back together, it may be a ill-formulated problem. You can still get a lot done with groupby, but this is something I would not try to solve with some clever trick in pandas. Even if you make it work, it will be hell to maintain.
I don't know how performance critical your problem is (i.e. how large are your Dataframes). If its just a few 10000 entries you can just explicitly loop over the pandas rows (warning: this is slow) and build the fifo list by hand.
I hacked together some code for this. The DateFrame you proposed is in there. I loop over all rows and do bookkeeping how many items are in stock. This is done in a queue q
which contains an element for each item and the element convienently is the purchase_date.
import queue
import pandas as pd
from pandas import Series, DataFrame
# modified (see text)
df1 = pd.DataFrame([['2018-07-05',8.0,1],
['2018-07-15',3.0,1],
['2018-08-05',2.0,1],
['2018-08-05',2.0,2]],
columns=["purchase_date", "qty", "p_id"])
df2 = pd.DataFrame([['2018-07-15',2.0,1],
['2018-08-04',7.0,1],
['2018-08-15',1.0,2]],
columns=["sell_date", "qty", "p_id"])
df3 = pd.DataFrame([['2018-07-25',1.0,1],
['2018-08-15',1.0,1]],
columns=["expired_date", "qty", "p_id"])
df1 = df1.rename(columns={'purchase_date':'date'})
df2 = df2.rename(columns={'sell_date':'date'})
df3 = df3.rename(columns={'expired_date' : 'date'})
df3['qty'] *= -1
df2['qty'] *= -1
df = pd.concat([df1,df2])\
.sort_values(by=["date"],ascending=True)\
.reset_index(drop=True)
# Necessary to distinguish between sold and expried items while looping
df['expired'] = False
df3['expired'] = True
df = pd.concat([df,df3])\
.sort_values(by=["date"],ascending=True)\
.reset_index(drop=True)
#date qty p_id expired
#7-05 8.0 1 False
#7-15 1.0 1 False
#7-15 -2.0 1 False
#7-25 -1.0 1 True
#8-04 -7.0 1 False
#8-05 2.0 1 False
#8-05 2.0 2 False
#8-15 -1.0 2 False
#8-15 -1.0 1 True
# Iteratively build up when_product_expired
when_product_expired = []
# p_id hardcoded here
p_id = 1
# q contains purchase dates for all individual items 'currently' in stock
q = queue.Queue()
for index, row in df[df['p_id'] == p_id].iterrows():
# if items are bought, put as many as 'qty' into q
if row['qty'] > 0:
for tmp in range(int(round(row['qty']))):
date = row['date']
q.put(date)
# if items are sold or expired, remove as many from q.
# if expired additionaly save purchase and expiration date into when_product_expired
elif row['qty'] < 0:
for tmp in range(int(round(-row['qty']))):
purchase_date = q.get()
if row['expired']:
print 'item p_id 1 was bought on', purchase_date
when_product_expired.append([purchase_date, row['date'], p_id])
when_product_expired = DataFrame(when_product_expired, columns=['purchase_date', 'expired_date', 'p_id'])
A few remarks:
I relied on your guarentee that
before selling and expire, there will always be some quantity of product in stock
This is not given for your example DataFrames. Before 2018-07-25 there are 9 items with p_id 1 bought and 9 sold. There is nothing in stock that could expire. I modified df1 so that 11 pieces are bought.
.groupby('p_id').apply(function)
or loop over df['p_id'].unique()
So while this is not scalable solution, I hope it helps you a bit. Good look
Upvotes: 1