Reputation: 707
I have several dataframes. Below is an example of each of them.
df_min
scale code R1 R2 ...
1 121 50 30
2 121 35 45
3 121 40 50
4 121 20 30
5 121 20 35
1 313 10 7
2 313 13 10
3 313 10 12
4 313 15 8
5 313 17 10
...
df_rate
scale code R1 R2 ...
1 121 20 40
2 121 30 20
3 121 20 30
4 121 15 40
5 121 10 30
1 313 10 5
2 313 15 10
3 313 25 10
4 313 10 15
5 313 20 5
...
df_max
scale code R1 R2 ...
1 121 30 200
2 121 100 175
3 121 70 100
4 121 80 90
5 121 75 35
1 313 60 70
2 313 35 70
3 313 50 60
4 313 50 45
5 313 45 68
...
df_stock
code R1 R2 ...
121 100 150
313 70 65
....
df_new
scale code R1 R2 ...
1 121 NaN NaN
2 121 NaN NaN
3 121 NaN NaN
4 121 NaN NaN
5 121 NaN NaN
1 313 NaN NaN
2 313 NaN NaN
3 313 NaN NaN
4 313 NaN NaN
5 313 NaN NaN
...
Columns like R1
and R2
are cities and there can be many. The code
column has unique product codes. They can also be many. The scale
marks the number of the week for which I am making calculations. For each code
, it's five weeks. Also I have an empty dataframe, in which I need to record the results of calculations.
I will give an example of a specific calculation, I hope it will be clearer. In an empty dataframe I take a cell where the scale
is 1
and the code
is 121
and columnn R1
. I find the corresponding value in df_stock
. It is equal to 100
. from it I subtract the value from df_rate
from the column R1
, where the scale
is 1
. If this value is greater than the value from df_min
from the column R1
, where the scale
is 1
, I write it down, if less, then I take the value from df_max
from column R1
, where the scale
is 1
. It turned out 80
, it is more than 50
, so I write it down. For the next cell, I subtract from the calculated value(80
) next value in the column R1
, where the scale
is 2
. And I also check it. It turned out 50
. I write it down. And so on. From the table df_stock
I take the first value, from which I begin to subtract.
In fact, df_min
is the minimum value and if I go below it, then I need to copy the corresponding value from df_max
.
These calculations must be made for each unique code
(each code
is represented in the scale
range from 1
to 5
) and for each column like R1
and R2
.
The result of the example is as follows:
scale code R1 R2 ...
1 121 80 110
2 121 50 90
3 121 70 60
4 121 55 90
5 121 45 60
1 313 60 60
2 313 45 50
3 313 20 40
4 313 50 25
5 313 30 20
...
I will be very grateful for any help!
UPD I wrote a script that does what I need. It is not optimal, but I have no other ideas. Is it possible to change it and add to it a loop? In my original data scale
is between 1
and 26
and with my code I have to write a condition for each value from the scale
.
My code:
import pandas as pd
import numpy as np
a = (1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
b = (121,121,121,121,121,313,313,313,313,313,444,444,444,444,444)
columns = ['scale', 'code', 'R1', 'R2', 'R3']
index = np.arange(15)
df_min = pd.DataFrame(columns=columns, index = index)
df_min['scale'] = a
df_min['code'] = b
df_min['R1'] = np.random.randint(10, 50, size=15)
df_min['R2'] = np.random.randint(10, 50, size=15)
df_min['R3'] = np.random.randint(10, 50, size=15)
df_rate = pd.DataFrame(columns=columns, index = index)
df_rate['scale'] = a
df_rate['code'] = b
df_rate['R1'] = np.random.randint(5, 40, size=15)
df_rate['R2'] = np.random.randint(5, 40, size=15)
df_rate['R3'] = np.random.randint(5, 40, size=15)
df_max = pd.DataFrame(columns=columns, index = index)
df_max['scale'] = a
df_max['code'] = b
df_max['R1'] = np.random.randint(50, 150, size=15)
df_max['R2'] = np.random.randint(50, 150, size=15)
df_max['R3'] = np.random.randint(50, 150, size=15)
index1 = np.arange(3)
df_stock = pd.DataFrame(columns=columns, index = index1)
df_stock['code'] = (121,313,444)
df_stock['R1'] = np.random.randint(100, 300, size=3)
df_stock['R2'] = np.random.randint(100, 300, size=3)
df_stock['R3'] = np.random.randint(100, 300, size=3)
df_new = pd.DataFrame(columns=columns, index = index)
df_new['scale'] = a
df_new['code'] = b
# set the index to 'code' to subtract df_rate from df_stock
df_stock = df_stock.set_index('code')
df_rate = df_rate.set_index('code')
df_new = df_stock - df_rate
# have to add back in the 'scale' column since it wasn't present in df_rate
df_new['scale'] = df_rate['scale']
# now set the index to use both 'code' and 'scale'
df_new = df_new.reset_index()
df_new = df_new.set_index(['code', 'scale'])
df_min = df_min.set_index(['code', 'scale'])
df_max = df_max.set_index(['code', 'scale'])
df_new = df_new.mask(df_new < df_min, df_max)
df_min = df_min.reset_index()
df_min.insert(2, 'test', 0)
df_max = df_max.reset_index()
df_max.insert(2, 'test', 0)
df_new = df_new.reset_index()
df_new.insert(2, 'test', 0)
df_rate = df_rate.reset_index()
df_rate.insert(2, 'test', 0)
df_new.loc[df_new['scale'].between(2,5), 'test':] = np.nan
df_rate_p = df_rate.loc[df_rate['scale'] == 2, :'scale']
df_new.index +=1
df_rate_p1 = df_new.loc[df_new['scale'] == 1, 'test':] - df_rate.loc[df_rate['scale'] == 2, 'test':]
df_new2 = pd.concat([df_rate_p, df_rate_p1], axis=1)
df_new = df_new.set_index(['code', 'scale']).fillna(df_new2.set_index(['code', 'scale'])).reset_index()
df_new = df_new.mask(df_new < df_min, df_max)
df_rate_p = df_rate.loc[df_rate['scale'] == 3, :'scale']
df_new.index +=1
df_rate_p1 = df_new.loc[df_new['scale'] == 2, 'test':] - df_rate.loc[df_rate['scale'] == 3, 'test':]
df_new2 = pd.concat([df_rate_p, df_rate_p1], axis=1)
df_new = df_new.set_index(['code', 'scale']).fillna(df_new2.set_index(['code', 'scale'])).reset_index()
df_new = df_new.mask(df_new < df_min, df_max)
df_rate_p = df_rate.loc[df_rate['scale'] == 4, :'scale']
df_new.index +=1
df_rate_p1 = df_new.loc[df_new['scale'] == 3, 'test':] - df_rate.loc[df_rate['scale'] == 4, 'test':]
df_new2 = pd.concat([df_rate_p, df_rate_p1], axis=1)
df_new = df_new.set_index(['code', 'scale']).fillna(df_new2.set_index(['code', 'scale'])).reset_index()
df_new = df_new.mask(df_new < df_min, df_max)
df_rate_p = df_rate.loc[df_rate['scale'] == 5, :'scale']
df_new.index +=1
df_rate_p1 = df_new.loc[df_new['scale'] == 4, 'test':] - df_rate.loc[df_rate['scale'] == 5, 'test':]
df_new2 = pd.concat([df_rate_p, df_rate_p1], axis=1)
df_new = df_new.set_index(['code', 'scale']).fillna(df_new2.set_index(['code', 'scale'])).reset_index()
df_new = df_new.mask(df_new < df_min, df_max)
df_new
Upvotes: 0
Views: 80
Reputation: 476
using pandas index
and MultiIndex
is very useful for comparing the correct rows with eachother.
Here is how you would use it:
# set the index to 'code' to subtract df_rate from df_stock
df_stock = df_stock.set_index('code')
df_rate = df_rate.set_index('code')
df_new = df_stock - df_rate
# have to add back in the 'scale' column since it wasn't present in df_rate
df_new['scale'] = df_rate['scale']
# now set the index to use both 'code' and 'scale'
df_new = df_new.reset_index()
df_new = df_new.set_index(['code', 'scale'])
df_min = df_min.set_index(['code', 'scale'])
df_max = df_max.set_index(['code', 'scale'])
# you may not actually need these lines, but sometimes it is necessary!
# intersection = df_new.index.intersection(df_min.index).intersection(df_max.index)
# df_new = df_new.loc[intersection]
# df_min = df_min.loc[intersection]
# df_max = df_max.loc[intersection]
# if df_new < df_max, then use values from df_max
# I didn't actually understand what you meant to do with the
# the data, so this is probably not quite what you intended,
# but you can can use this to see how it works and implement
# your algorithm
df_new = df_new.mask(df_new < df_min, df_max)
Upvotes: 1