Woox
Woox

Reputation: 33

Divide value having same date in Dataframe to the count of values in each date

I have two DataFrames one have values against various dates which you can replicate for vlaues of first date as following

import pandas as pd
values_df = pd.DataFrame({'date': ['2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16','2021-08-16', '2021-08-16'], 
                   'val': ['0.38956', '0.733198', '0.1856580', '-0.759235', '0.112378', '0.7436924' , '0.32941900', '0.7423981', '0.71269175', '0.127698146']})

other one having count of those values for each date, it can be replicated using

count = pd.DataFrame({'date': ['2021-08-16', '2021-08-17', '2021-08-18', '2021-08-19', '2021-08-20', '2021-08-21', '2021-08-22', '2021-08-23','2021-08-24', '2021-08-25'], 
                      'count': ['12', '47', '85', '65', '36', '126' , '75', '85', '175', '63']})

now there are 12 values for the first date 2021-08-16 I need to divide every value 2021-08-16 of with 12 and same with all other values according to their dates and number of counts. I have tried to create new column of values_df that has number of counts so that I can do values_df['val'] / values_df['count']but it doesn't repeat values for each date. Can someone guide me how can I achieve this? Thank you

Upvotes: 1

Views: 100

Answers (2)

Clément Perroud
Clément Perroud

Reputation: 533

You can use the join function on dataframe this way :

import pandas as pd
values_df = pd.DataFrame({'date': ['2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16','2021-08-16', '2021-08-16'],
                   'val': ['0.38956', '0.733198', '0.1856580', '-0.759235', '0.112378', '0.7436924' , '0.32941900', '0.7423981', '0.71269175', '0.127698146']})

count = pd.DataFrame({'date': ['2021-08-16', '2021-08-17', '2021-08-18', '2021-08-19', '2021-08-20', '2021-08-21', '2021-08-22', '2021-08-23','2021-08-24', '2021-08-25'],
                      'count': ['12', '47', '85', '65', '36', '126' , '75', '85', '175', '63']})


values_df = values_df.set_index('date').join(count.set_index('date'))
#now, you datafram values_df have a val and count value for each date

#you can create your new column like that (but you need to make sure that every val have a defined count value !=0
new_colums = [ float(values_df['val'][i]) / float(values_df['count'][i]) for i in range(len(values_df['val']))]

values_df['name of your new column (var/count)'] = new_colums

print(values_df)

It will display this :

                    val count  name of your new column (var/count)
date                                                              
2021-08-16      0.38956    12                             0.032463
2021-08-16     0.733198    12                             0.061100
2021-08-16    0.1856580    12                             0.015471
2021-08-16    -0.759235    12                            -0.063270
2021-08-16     0.112378    12                             0.009365
2021-08-16    0.7436924    12                             0.061974
2021-08-16   0.32941900    12                             0.027452
2021-08-16    0.7423981    12                             0.061867
2021-08-16   0.71269175    12                             0.059391
2021-08-16  0.127698146    12                             0.010642

Upvotes: 1

jezrael
jezrael

Reputation: 863166

If want reuse count DataFrame use Series.map:

values_df['new'] = values_df['val'].astype(float) / values_df['date'].map(count.set_index('date')['count'].astype(int))
print (values_df)
         date          val       new
0  2021-08-16      0.38956  0.032463
1  2021-08-16     0.733198  0.061100
2  2021-08-16    0.1856580  0.015471
3  2021-08-16    -0.759235 -0.063270
4  2021-08-16     0.112378  0.009365
5  2021-08-16    0.7436924  0.061974
6  2021-08-16   0.32941900  0.027452
7  2021-08-16    0.7423981  0.061867
8  2021-08-16   0.71269175  0.059391
9  2021-08-16  0.127698146  0.010642

Or if want count number of values in values_df only use Series.map with Series.value_counts:

values_df['new'] = values_df['val'].astype(float) / values_df['date'].map(values_df['date'].value_counts())
print (values_df)
         date          val       new
0  2021-08-16      0.38956  0.038956
1  2021-08-16     0.733198  0.073320
2  2021-08-16    0.1856580  0.018566
3  2021-08-16    -0.759235 -0.075924
4  2021-08-16     0.112378  0.011238
5  2021-08-16    0.7436924  0.074369
6  2021-08-16   0.32941900  0.032942
7  2021-08-16    0.7423981  0.074240
8  2021-08-16   0.71269175  0.071269
9  2021-08-16  0.127698146  0.012770

Upvotes: 1

Related Questions