Reputation: 1998
if I have a df below as
date | id |
12/02/2012 b2
12/03/2013 b6
11/23/2013 b3
If I want to add two new columns with mock or fake data in the form of fake_rates and fake_minutes below where the rates are anywhere from 0.00 to 3.00 and the mins values are anywhere from 0.0 - 30.0
date | id | fake_rates | fake_minutes
12/02/2012 b2 1.05 2.0
12/03/2013 b6 .56 1.6
12/03/2013 b8 .33 11.2
11/23/2013 b3 .19 122.0
and then group them as
where the rates and minutes are the avg of the date grouped by
example output
date | rates | minutes
12/01/2012 1.39 23.00
12/02/2012 1.29 22.33
Thanks!
Upvotes: 1
Views: 1490
Reputation: 62493
numpy.random.uniform
because it has a low
and high
parameter to specify the value range.numpy.round
to specify the number of decimal places for the data.import numpy
import pandas as pd
# setup the dataframe
df = pd.DataFrame({'date': ['12/02/2012', '12/03/2013', '11/23/2013', '12/02/2012', '12/03/2013', '11/23/2013'], 'id': ['b2', 'b6', 'b3', 'b2', 'b6', 'b3']})
# add synthetic data
np.random.seed(365)
df['fake_minutes'] = np.round(np.random.uniform(0.0, 30.0, size=(len(df), 1)), 2)
df['fake_rates'] = np.round(np.random.uniform(0.0, 3.0, size=(len(df), 1)), 2)
# set the date to a datetime format
df.date = pd.to_datetime(df.date)
# display(df)
date id fake_minutes fake_rates
0 2012-12-02 b2 28.24 2.30
1 2013-12-03 b6 19.25 0.92
2 2013-11-23 b3 20.54 1.33
3 2012-12-02 b2 17.66 0.33
4 2013-12-03 b6 16.32 1.32
5 2013-11-23 b3 11.04 2.26
# groupby and aggregate the mean
dfg = df.groupby('date', as_index=False).agg({'fake_minutes': 'mean', 'fake_rates': 'mean'})
# display(dfg) # the dates are all unique, so it
date fake_minutes fake_rates
0 2012-12-02 22.950 1.315
1 2013-11-23 15.790 1.795
2 2013-12-03 17.785 1.120
Upvotes: 2