Reputation: 33
I have two DataFrames one have values against various dates which you can replicate for vlaues of first date as following
import pandas as pd
values_df = pd.DataFrame({'date': ['2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16','2021-08-16', '2021-08-16'],
'val': ['0.38956', '0.733198', '0.1856580', '-0.759235', '0.112378', '0.7436924' , '0.32941900', '0.7423981', '0.71269175', '0.127698146']})
other one having count of those values for each date, it can be replicated using
count = pd.DataFrame({'date': ['2021-08-16', '2021-08-17', '2021-08-18', '2021-08-19', '2021-08-20', '2021-08-21', '2021-08-22', '2021-08-23','2021-08-24', '2021-08-25'],
'count': ['12', '47', '85', '65', '36', '126' , '75', '85', '175', '63']})
now there are 12 values for the first date 2021-08-16
I need to divide every value 2021-08-16
of with 12 and same with all other values according to their dates and number of counts. I have tried to create new column of values_df
that has number of counts so that I can do values_df['val'] / values_df['count']
but it doesn't repeat values for each date. Can someone guide me how can I achieve this? Thank you
Upvotes: 1
Views: 100
Reputation: 533
You can use the join function on dataframe this way :
import pandas as pd
values_df = pd.DataFrame({'date': ['2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16','2021-08-16', '2021-08-16'],
'val': ['0.38956', '0.733198', '0.1856580', '-0.759235', '0.112378', '0.7436924' , '0.32941900', '0.7423981', '0.71269175', '0.127698146']})
count = pd.DataFrame({'date': ['2021-08-16', '2021-08-17', '2021-08-18', '2021-08-19', '2021-08-20', '2021-08-21', '2021-08-22', '2021-08-23','2021-08-24', '2021-08-25'],
'count': ['12', '47', '85', '65', '36', '126' , '75', '85', '175', '63']})
values_df = values_df.set_index('date').join(count.set_index('date'))
#now, you datafram values_df have a val and count value for each date
#you can create your new column like that (but you need to make sure that every val have a defined count value !=0
new_colums = [ float(values_df['val'][i]) / float(values_df['count'][i]) for i in range(len(values_df['val']))]
values_df['name of your new column (var/count)'] = new_colums
print(values_df)
It will display this :
val count name of your new column (var/count)
date
2021-08-16 0.38956 12 0.032463
2021-08-16 0.733198 12 0.061100
2021-08-16 0.1856580 12 0.015471
2021-08-16 -0.759235 12 -0.063270
2021-08-16 0.112378 12 0.009365
2021-08-16 0.7436924 12 0.061974
2021-08-16 0.32941900 12 0.027452
2021-08-16 0.7423981 12 0.061867
2021-08-16 0.71269175 12 0.059391
2021-08-16 0.127698146 12 0.010642
Upvotes: 1
Reputation: 863166
If want reuse count
DataFrame
use Series.map
:
values_df['new'] = values_df['val'].astype(float) / values_df['date'].map(count.set_index('date')['count'].astype(int))
print (values_df)
date val new
0 2021-08-16 0.38956 0.032463
1 2021-08-16 0.733198 0.061100
2 2021-08-16 0.1856580 0.015471
3 2021-08-16 -0.759235 -0.063270
4 2021-08-16 0.112378 0.009365
5 2021-08-16 0.7436924 0.061974
6 2021-08-16 0.32941900 0.027452
7 2021-08-16 0.7423981 0.061867
8 2021-08-16 0.71269175 0.059391
9 2021-08-16 0.127698146 0.010642
Or if want count number of values in values_df
only use Series.map
with Series.value_counts
:
values_df['new'] = values_df['val'].astype(float) / values_df['date'].map(values_df['date'].value_counts())
print (values_df)
date val new
0 2021-08-16 0.38956 0.038956
1 2021-08-16 0.733198 0.073320
2 2021-08-16 0.1856580 0.018566
3 2021-08-16 -0.759235 -0.075924
4 2021-08-16 0.112378 0.011238
5 2021-08-16 0.7436924 0.074369
6 2021-08-16 0.32941900 0.032942
7 2021-08-16 0.7423981 0.074240
8 2021-08-16 0.71269175 0.071269
9 2021-08-16 0.127698146 0.012770
Upvotes: 1