Reputation: 398
I have a dataset(df1) that has 2 columns.
F_Date B_Date
01/09/2019 02/08/2019
01/09/2019 03/08/2019
02/09/2019 03/08/2019
01/09/2019 04/08/2019
02/09/2019 04/08/2019
03/09/2019 04/08/2019
02/09/2019 05/08/2019
03/09/2019 05/08/2019
04/09/2019 05/08/2019
01/09/2019 06/08/2019
02/09/2019 06/08/2019
03/09/2019 06/08/2019
04/09/2019 06/08/2019
05/09/2019 06/08/2019
02/09/2019 07/08/2019
03/09/2019 07/08/2019
04/09/2019 07/08/2019
05/09/2019 07/08/2019
06/09/2019 07/08/2019
02/09/2019 08/08/2019
03/09/2019 08/08/2019
I want to generate a new column value_1 such that :
for each date_1 the value_1(aggregated) should not exceed 5000.
date_2 and value_1 should have increasing trend i.e. aggregated on date_2, value_1 should be increasing each day for example if for date_2, the aggregated value_1 is 1000, then for next date_2 the value should be greater than 1000.
The dataframe have unique (date_1,date_2) tuples.
After thinking, I was thinking of the approach such that :
Step - 1 : F_Date has values from 01/09/2019 to 30/09/2019. I want to generate a value_1 such that it has an increasing trend and value_1 should be in the range of 50-25000.
Step - 2 : Once we have a new dataframe(df2 from step-1) having F_1 and value_1, we read the dataframe (df1)we have and assign the value_1 in such a way that for B_Date, there is an increasing trend as well.
For example :
Lets say for F_Date in df2 we have an entry for 01/01/2019 as 50. and in df1 we have 3 B_Date corresponding to F_Date, then the dataframe would look like :
EXPECTED OUTPUT :
F_Date B_Date value_1
01/09/2019 02/08/2019 5
01/09/2019 02/08/2019 15
01/09/2019 02/08/2019 30
I am not able to understand how to achieve the trend(increasing) for both the steps. Can anyone help with that and also with step-2.
Thanks
Upvotes: 3
Views: 227
Reputation: 904
I might be wrong but your question is quite vague in the sense of trend generation. However, to the best of my knowledge, this is how you go :
Step -1
Generate new column value_1 with the trend using :
import numpy as np
min_y = 50
max_y = 5000
min_x = 1
# any number max_x can be chosen
# this number controls the shape of the logarithm, therefore the final distribution
max_x = 10
# generate (uniformly) and sort 30 random float x in [min_x, max_x)
x = np.sort(np.random.uniform(min_x, max_x, 30))
# get log(x), i.e. values in [log(min_x), log(max_x))
log_x = np.log(x)
# scale log(x) to the new range [min_y, max_y)
y = (max_y - min_y) * ((log_x - np.log(min_x)) / (np.log(max_x) - np.log(min_x))) + min_y
Once you have the data, you can do a outer join to the other dataframe and then create a dataframe having value_1 in the dataset.
joined_df = pd.merge(df1,df2,on='F_Date')
I am not sure of how you need the second trend in place, more details would help.
Upvotes: 1