Rach Odwyer
Rach Odwyer

Reputation: 184

Update values in a dataframe at every nth value with a group by

I have dataframe for which i would like to populate with some values in 'snail_chart_dates' without using a loop.

The values I would like are the first 'value_date' and the last 'value_date' and the middle 'value_date' grouped by code_id

I think this could be done maybe with a combination of .iloc[0] and .iloc[-1] but I have no idea how to apply this to a groupby or how to get the middle

Example trimmed data frame - other one is much longer

enter image description here

    import pandas as pd
    import numpy as np 

    returns = pd.DataFrame({
            'value_date' : ['2018-01-31', '2018-02-28', '2018-03-31','2018-04-30', '2018-05-31', '2018-06-30', 
                            '2018-01-31', '2018-02-28', '2018-03-31','2018-04-30', '2018-05-31', '2018-06-30'],
            'code_id' :  ['AUD','AUD','AUD','AUD','AUD','AUD', 
                          'USD','USD','USD','USD','USD','USD'],
            'gross_return': [.01, .02, .03, -.4, -.06, -.02, 
                             .06, .8, .9, .4, -1.06, .03],
            'bm_return': [.01, .02, .03, -.4, -.06, -.02, 
                             .06, .8, .9, .4, -1.06, .03],
            })


    returns["snail_chart_dates"] = ""

example desired result - without middle

enter image description here

Upvotes: 0

Views: 254

Answers (2)

orlp
orlp

Reputation: 117691

Assuming you put value_date into the Pandas datetime format, there is a very simple solution.

We can group by the code, and use the .quantile() function to get our dates.

returns["value_date"] = pd.to_datetime(returns["value_date"])
code_groups = returns.groupby("code_id")["value_date"]
code_groups.quantile(0, interpolation="nearest")
code_id
AUD   2018-01-31
USD   2018-01-31
Name: value_date, dtype: datetime64[ns]
code_groups.quantile(0.5, interpolation="nearest")
code_id
AUD   2018-03-31
USD   2018-03-31
Name: value_date, dtype: datetime64[ns]

You can then assign this information as you wish into your table.

Upvotes: 0

Erfan
Erfan

Reputation: 42916

First we get the min and max date per group.

Then we get the indices of these values per group with idxmin and idxmax.

To get the middle values we get the median of the index per group and round up with np.ceil

And finally we assign these values to our new column with loc:

grp = returns.groupby('code_id')
s1 = grp['value_date'].transform('min')
s2 = grp['value_date'].transform('max')
s3 = grp.apply(lambda x: np.ceil(np.median(x.index))).values

idx_min = grp['value_date'].idxmin().values
idx_max = grp['value_date'].idxmax().values

returns.loc[idx_min, 'snail_chart_dates'] = s1.loc[idx_min]
returns.loc[idx_max, 'snail_chart_dates'] = s2.loc[idx_max]
returns.loc[s3, 'snail_chart_dates'] = returns.loc[s3, 'value_date']

   value_date code_id  gross_return  bm_return snail_chart_dates
0  2018-01-31     AUD          0.01       0.01        2018-01-31
1  2018-02-28     AUD          0.02       0.02               NaT
2  2018-03-31     AUD          0.03       0.03               NaT
3  2018-04-30     AUD         -0.40      -0.40        2018-04-30
4  2018-05-31     AUD         -0.06      -0.06               NaT
5  2018-06-30     AUD         -0.02      -0.02        2018-06-30
6  2018-01-31     USD          0.06       0.06        2018-01-31
7  2018-02-28     USD          0.80       0.80               NaT
8  2018-03-31     USD          0.90       0.90               NaT
9  2018-04-30     USD          0.40       0.40        2018-04-30
10 2018-05-31     USD         -1.06      -1.06               NaT
11 2018-06-30     USD          0.03       0.03        2018-06-30

Upvotes: 1

Related Questions