Reputation: 990
I have a dataframe like
df = id - date
ps1 - 2020-01-2
ps1 - 2020-01-3
ps2 - 2020-01-3
ps2 - 2020-01-5
I would like to get output like
output = id - date_diff
ps1 - 1
ps2 - 2
How can i group the ids and calculate the data differences ?
Upvotes: 0
Views: 32
Reputation: 26676
convert date
to datetime, diff
and drop
all the nulls.
df["date"] = pd.to_datetime(df["date"])
df["date_diff"] = df.groupby("id")["date"].diff().astype('timedelta64[D]')
df[df.date_diff.notnull()]
Upvotes: 2
Reputation: 22503
First coerce your date
column to datetime, and then groupby
and diff
:
df["date"] = pd.to_datetime(df["date"])
df["date_diff"] = df.groupby("id")["date"].diff()
print (df.drop_duplicates("id",keep="last"))
id date date_diff
1 ps1 2020-01-03 1 days
3 ps2 2020-01-05 2 days
Upvotes: 2