Reputation: 357
I know there are questions with similar titles out there, but none of them really answers my question. I have a data frame as below. The "index" column is actually timestamp. Column A is how many tones of materials have been dumped to a crusher. Column B is the crushing rate at each timestamp. What I want to know is when a load of material (column A) will be crushed based on crushing rate (column B).
There are three possible scenarios.
I have tried to calculate the cumulative value of column A and B and used merge_asof to perform fuzzy join. But it doesn't work as expected as excessive crushing capacity is not stored. Only crushing rate after materials loaded should be considered.
A = {'index':range(1,11),'A':[300,0,0,400,0,0,0,0,150,0]}
B = {'index':range(1,11),'B':[102,103,94,120,145,114,126,117,107,100]}
A = pd.DataFrame(data=A)
B = pd.DataFrame(data=B)
Here is the expected outcome:
IndexA A IndexB B_accumulate
1 300 4 419
4 400 8 502
9 150 10 207
B_accumulate is running total of crushing rate(B), which is reset to 0 when a load of material is crushed (when B_accumlate>=A)
Upvotes: 3
Views: 224
Reputation: 2407
I've simplified the structures, using Series instead of DataFrame, and the indexes start at zero. cumsum() and searchsorted() are applied.
Load = pd.Series([300,0,0,400,50,0,0,0,150,0]) # aka 'A'
Rate = pd.Series([102,103,94,120,145,114,126,117,107,100]) # aka 'B'
# Storage for the result:
H=[] # [ (indexLoad, Load, indexRate, excess) ... ]
# Find the 1st non 0 load:
load1_idx= len(Load)
for lix in range(len(Load)):
a= Load[lix]
if a!=0:
csumser= Rate.cumsum()
rix= csumser.searchsorted(a)
excess= csumser[rix]-a
H.append( (lix,a,rix,excess) )
load1_idx=lix
break
# Processing
for lix in range(load1_idx+1,len(Load)):
a=Load[lix]
if a==0:
continue
last_rix= H[-1][-2]
csumser[last_rix:]= Rate[last_rix:]
if lix==last_rix:
csumser[lix]= H[-1][-1] # excess
csumser[last_rix:]= csumser[last_rix:].cumsum()
rix= csumser[last_rix:].searchsorted(a)
rix+= last_rix
excess= csumser[rix]-a
H.append( (lix,a,rix,excess) )
df= pd.DataFrame(H, columns=["indexLoad","Load","indexRate","rate_excess"])
print(df)
indexLoad Load indexRate rate_excess
0 0 300 3 119
1 3 400 6 104
2 4 50 6 76
3 8 150 7 93
Upvotes: 1
Reputation:
Possible approach. The problem is splitted into 2 parts - get actual amount of material (which cannot be negative) and analyze loads (groups of rows when there is any amount of material to crush during current time step).
import numpy as np
import pandas as pd
def get_load(df):
""" get loaded material minus crushed material """
current_load = (df['A'] - df['B']).values
if current_load[0] < 0:
current_load[0] = 0
for idx in range(1, len(current_load)):
correct_value = current_load[idx - 1] + current_load[idx]
if correct_value < 0:
current_load[idx] = 0
else:
current_load[idx] = correct_value
return current_load
def get_work_load_chunk_stat(df):
""" get chunks when material actually crushing """
if df['load'].sum() == 0:
return
ans = pd.DataFrame(
{'indexA': [df.iloc[0, :]['indexA']],
'total_load': [df['A'].sum()],
'loads_qty': [df[df['A'] > 0]['A'].count()],
'indexB': [df.iloc[-1, :]['indexB']],
'total_work': [df['B'].sum()]})
return ans
if __name__ == '__main__':
A = {'indexA': range(22),
'A': [0, 300, 0, 0, 0, 0, 400, 0, 100, 0, 0, 0, 300, 0, 0, 0, 0, 400, 0, 100, 0, 0]}
B = {'indexB': range(22),
'B': [99, 102, 103, 94, 120, 145, 114, 126, 117, 107, 87, 99, 102, 103, 94, 120, 145, 114, 126, 117, 107, 87]}
data = pd.concat([pd.DataFrame(data=A), pd.DataFrame(data=B)], axis=1)
data['load'] = get_load(data)
data['load_check'] = np.where(data['load'] == 0, 1, 0)
data['load_check'] = data['load_check'].shift(fill_value=0).cumsum()
# print(data)
result = (
data
.groupby('load_check')
.apply(get_work_load_chunk_stat)
.reset_index(drop=True))
print(result)
Output:
indexA total_load loads_qty indexB total_work
0 1 300 1 4 419
1 6 500 2 10 551
2 12 300 1 15 419
3 17 500 2 21 551
Upvotes: 1
Reputation: 56
create the DF combined A & B:
A = {'index':range(1,11),'A':[300,0,400,0,0,0,0,0,100,0]}
B = {'index':range(1,11),'B':[102,103,94,120,145,114,126,117,107,87]}
df_A = pd.DataFrame(data=A)
df_B = pd.DataFrame(data=B)
df_com = pd.concat([df_A,df_B],axis=1).drop('index',axis=1)
create index:
indexA = list(df_com.A[df_com.A.ne(0)].index + 1)
indexB = np.array(indexA) - 2
indexB = np.append(indexB[1:],(len(df_com)-1))
replace 0 with ffill() in col A:
df_com['A'] = df_com.A.replace(0,method='pad')
groupby and add index columns:
df_new =df_com.groupby("A",sort=False).apply(lambda x:x.B.shift(1).sum()).reset_index()
df_new['indexA'] = indexA
df_new['indexB'] = indexB
df_new
Upvotes: 1
Reputation: 993
Here is a very verbose solution which I hope is generalisable to your full data. I am sure you can simplify it.
C = A.join(B.set_index('index'), on='index')
C['A_filled'] = C['A'].replace(to_replace=0, method='ffill')
C['cumul_load'] = C['A'].cumsum()
C['load_number'] = C.groupby('cumul_load').ngroup() + 1
C['B_accum'] = C.groupby('load_number')['B'].cumsum()
C['A_fully_crushed'] = C['B_accum'] > C['A_filled']
C['first_index_fully_crushed'] = C.groupby('load_number')['A_fully_crushed'].cumsum() == 1
indexA_ = C['index'][C['A'] > 0].tolist()
A_ = C['A'][C['A'] > 0].tolist()
indexB_ = C['index'][C['first_index_fully_crushed'] == True].tolist()
B_accumulate_ = C['B_accum'][C['first_index_fully_crushed'] == True].tolist()
result = pd.DataFrame({'indexA': indexA_, 'A': A_, 'indexB': indexB_, 'B_accumulate': B_accumulate_})
This yields
indexA A indexB B_accumulate
0 1 300 4 419
1 6 400 9 464
Upvotes: 1