vp7
vp7

Reputation: 398

Generate a trend specific data

I have a dataset(df1) that has 2 columns.

F_Date      B_Date
01/09/2019  02/08/2019
01/09/2019  03/08/2019
02/09/2019  03/08/2019
01/09/2019  04/08/2019
02/09/2019  04/08/2019
03/09/2019  04/08/2019
02/09/2019  05/08/2019
03/09/2019  05/08/2019
04/09/2019  05/08/2019
01/09/2019  06/08/2019
02/09/2019  06/08/2019
03/09/2019  06/08/2019
04/09/2019  06/08/2019
05/09/2019  06/08/2019
02/09/2019  07/08/2019
03/09/2019  07/08/2019
04/09/2019  07/08/2019
05/09/2019  07/08/2019
06/09/2019  07/08/2019
02/09/2019  08/08/2019
03/09/2019  08/08/2019

I want to generate a new column value_1 such that :

  1. for each date_1 the value_1(aggregated) should not exceed 5000.

  2. date_2 and value_1 should have increasing trend i.e. aggregated on date_2, value_1 should be increasing each day for example if for date_2, the aggregated value_1 is 1000, then for next date_2 the value should be greater than 1000.

The dataframe have unique (date_1,date_2) tuples.

After thinking, I was thinking of the approach such that :

Step - 1 : F_Date has values from 01/09/2019 to 30/09/2019. I want to generate a value_1 such that it has an increasing trend and value_1 should be in the range of 50-25000.

Step - 2 : Once we have a new dataframe(df2 from step-1) having F_1 and value_1, we read the dataframe (df1)we have and assign the value_1 in such a way that for B_Date, there is an increasing trend as well.

For example :

Lets say for F_Date in df2 we have an entry for 01/01/2019 as 50. and in df1 we have 3 B_Date corresponding to F_Date, then the dataframe would look like :

EXPECTED OUTPUT :

     F_Date     B_Date     value_1
    01/09/2019  02/08/2019  5
    01/09/2019  02/08/2019  15
    01/09/2019  02/08/2019  30

I am not able to understand how to achieve the trend(increasing) for both the steps. Can anyone help with that and also with step-2.

Thanks

Upvotes: 3

Views: 227

Answers (1)

dper
dper

Reputation: 904

I might be wrong but your question is quite vague in the sense of trend generation. However, to the best of my knowledge, this is how you go :

Step -1

Generate new column value_1 with the trend using :

import numpy as np

min_y = 50
max_y = 5000
min_x = 1
# any number max_x can be chosen
# this number controls the shape of the logarithm, therefore the final distribution
max_x = 10

# generate (uniformly) and sort 30 random float x in [min_x, max_x)
x = np.sort(np.random.uniform(min_x, max_x, 30))
# get log(x), i.e. values in [log(min_x), log(max_x))
log_x = np.log(x)
# scale log(x) to the new range [min_y, max_y)
y = (max_y - min_y) * ((log_x - np.log(min_x)) / (np.log(max_x) - np.log(min_x))) + min_y

Once you have the data, you can do a outer join to the other dataframe and then create a dataframe having value_1 in the dataset.

joined_df = pd.merge(df1,df2,on='F_Date')

I am not sure of how you need the second trend in place, more details would help.

Upvotes: 1

Related Questions