Reputation: 184
I have dataframe for which i would like to populate with some values in 'snail_chart_dates' without using a loop.
The values I would like are the first 'value_date' and the last 'value_date' and the middle 'value_date' grouped by code_id
I think this could be done maybe with a combination of .iloc[0]
and .iloc[-1]
but I have no idea how to apply this to a groupby
or how to get the middle
Example trimmed data frame - other one is much longer
import pandas as pd
import numpy as np
returns = pd.DataFrame({
'value_date' : ['2018-01-31', '2018-02-28', '2018-03-31','2018-04-30', '2018-05-31', '2018-06-30',
'2018-01-31', '2018-02-28', '2018-03-31','2018-04-30', '2018-05-31', '2018-06-30'],
'code_id' : ['AUD','AUD','AUD','AUD','AUD','AUD',
'USD','USD','USD','USD','USD','USD'],
'gross_return': [.01, .02, .03, -.4, -.06, -.02,
.06, .8, .9, .4, -1.06, .03],
'bm_return': [.01, .02, .03, -.4, -.06, -.02,
.06, .8, .9, .4, -1.06, .03],
})
returns["snail_chart_dates"] = ""
example desired result - without middle
Upvotes: 0
Views: 254
Reputation: 117691
Assuming you put value_date
into the Pandas datetime format, there is a very simple solution.
We can group by the code, and use the .quantile()
function to get our dates.
returns["value_date"] = pd.to_datetime(returns["value_date"])
code_groups = returns.groupby("code_id")["value_date"]
code_groups.quantile(0, interpolation="nearest")
code_id
AUD 2018-01-31
USD 2018-01-31
Name: value_date, dtype: datetime64[ns]
code_groups.quantile(0.5, interpolation="nearest")
code_id
AUD 2018-03-31
USD 2018-03-31
Name: value_date, dtype: datetime64[ns]
You can then assign this information as you wish into your table.
Upvotes: 0
Reputation: 42916
First we get the min
and max
date per group.
Then we get the indices of these values per group with idxmin
and idxmax
.
To get the middle values we get the median
of the index per group and round up
with np.ceil
And finally we assign these values to our new column with loc
:
grp = returns.groupby('code_id')
s1 = grp['value_date'].transform('min')
s2 = grp['value_date'].transform('max')
s3 = grp.apply(lambda x: np.ceil(np.median(x.index))).values
idx_min = grp['value_date'].idxmin().values
idx_max = grp['value_date'].idxmax().values
returns.loc[idx_min, 'snail_chart_dates'] = s1.loc[idx_min]
returns.loc[idx_max, 'snail_chart_dates'] = s2.loc[idx_max]
returns.loc[s3, 'snail_chart_dates'] = returns.loc[s3, 'value_date']
value_date code_id gross_return bm_return snail_chart_dates
0 2018-01-31 AUD 0.01 0.01 2018-01-31
1 2018-02-28 AUD 0.02 0.02 NaT
2 2018-03-31 AUD 0.03 0.03 NaT
3 2018-04-30 AUD -0.40 -0.40 2018-04-30
4 2018-05-31 AUD -0.06 -0.06 NaT
5 2018-06-30 AUD -0.02 -0.02 2018-06-30
6 2018-01-31 USD 0.06 0.06 2018-01-31
7 2018-02-28 USD 0.80 0.80 NaT
8 2018-03-31 USD 0.90 0.90 NaT
9 2018-04-30 USD 0.40 0.40 2018-04-30
10 2018-05-31 USD -1.06 -1.06 NaT
11 2018-06-30 USD 0.03 0.03 2018-06-30
Upvotes: 1