Reputation: 45
I am having a table called 'data' in that the values will be like following,
ID NAME DOB LOCATION
1 bob 08/10/1985 NEW JERSEY
1 bob 15/09/1987 NEW YORK
2 John 08/10/1985 NORTH CAROLINA
2 John 26/11/1990 OKLAHOMA
For example I want output like,
ID NAME No.of.Days
1 bob difference of two given dates in days
2 John difference of two given dates in days
Please help me to form a python code to get the expected output.
Upvotes: 2
Views: 259
Reputation: 847
First, You need to convert Date
column into date
format. Lets suppose you are reading from .csv
then read your .csv
file as follows
df = pd.read_csv('yourfile.csv', parse_dates = ['DOB'])
otherwise, convert your existing dataframe column into date
format as follows.
df['DOB'] = pd.to_datetime(df['DOB'])
now, you can perform the usual numeric operations.
df.groupby(['ID','NAME'])['DOB'].apply(lambda x: abs(pd.to_datetime(list(x)[0]) - pd.to_datetime(list(x)[1]))).reset_index(name='No.Of.Days')
Upvotes: 1
Reputation: 3770
If there will be only two dates in a for a given ID then below works!
df.groupby(['ID','NAME'])['DOB'].apply(lambda x: abs(pd.to_datetime(list(x)[0]) - pd.to_datetime(list(x)[1]))).reset_index(name='No.Of.Days')
Output
ID NAME No.Of.Days
0 1 bob 766 days
1 2 John 1934 days
you can use np.diff also
df.groupby(['ID','NAME'])['DOB'].apply(lambda x: np.diff(list(x))[0]).reset_index(name='No.Of.Days')
Upvotes: 1