The Great
The Great

Reputation: 7693

Shift dates by value unique for each subject

I have a dataframe like as shown below

df1 = pd.DataFrame({'person_id': [11, 21, 31, 41, 51],
                        'date_1': ['12/30/1961', '05/29/1967', '02/03/1957', '7/27/1959', '01/13/1971'],
                        'date_2': ['07/23/2017','05/29/2017','02/03/2015',np.nan,np.nan]})
df1 = df1.melt('person_id', value_name='dates')
df1['dates'] = pd.to_datetime(df1['dates'])
df1 = df1.assign(pd= (df1['dates'].dt.dayofyear - 1),
     nd=((df1['dates'] + pd.offsets.YearEnd(1)) - df1['dates']).dt.days)

With the help of this post, I am able to achieve part of what I wanted to.

Now what I would like to do is

a) Shift the dates backward (subtract) based on minimum of pd value for each subject b) Shift the dates forward (add) based on minimum of nd value for each subject c) Check whether the year component remains the same between 3 columns dates, shift_forward and shift_backward

So, I got the minimum of pd and nd values using the code below

min_pd = df1.groupby(['person_id'])['pd'].min()
min_nd = df1.groupby(['person_id'])['nd'].min()
year_change = df1.dates.dt.year.eq(df1.shift_backward.dt.year(df1.shift_forward.dt.year))

but not sure how can I use this min_pd and min_nd as date offset for each subject.

I expect my output to be like as shown below

enter image description here

Upvotes: 1

Views: 54

Answers (1)

anky
anky

Reputation: 75080

you can first do a transform min with groupby and then your calculation would be easy with pd.to_timedelta:

#get groupby min transformed to the length of the dataframe
min_pd_nd = df1.groupby('person_id')['pd','nd'].transform('min')

df1['Shift_backward'] = df1['dates'].sub(pd.to_timedelta(min_pd_nd['pd'],unit='d'))
df1['Shift_Forward'] = df1['dates'].add(pd.to_timedelta(min_pd_nd['nd'],unit='d'))
#check if in a given row all year are unique 
c = (df1[['dates','Shift_backward','Shift_Forward']].stack(dropna=False).dt.year
     .groupby(level=0).nunique())
df1['year_change'] = np.where(c.gt(1),'Yes','No')

print(df1[['person_id','dates','Shift_backward','Shift_Forward','year_change']])

   person_id      dates Shift_backward Shift_Forward year_change
0         11 1961-12-30     1961-06-10    1961-12-31          No
1         21 1967-05-29     1967-01-01    1967-12-31          No
2         31 1957-02-03     1957-01-01    1957-12-31          No
3         41 1959-07-27     1959-01-01    1959-12-31          No
4         51 1971-01-13     1971-01-01    1971-12-31          No
5         11 2017-07-23     2017-01-01    2017-07-24          No
6         21 2017-05-29     2017-01-01    2017-12-31          No
7         31 2015-02-03     2015-01-01    2015-12-31          No
8         41        NaT            NaT           NaT          No
9         51        NaT            NaT           NaT          No

Upvotes: 1

Related Questions