Reputation: 771
I have a dataframe containing a column which contains a list of dates. The length of dates can range (2+ dates). I was hoping to create a new column that contains the number of days between the minimum and maximum of dates in this list and am not entirely sure the best way to do this? Any help would be greatly appreciated!
data = [
["Item_1", ["2020-06-01", "2020-06-02", "2020-07-05"]],
["Item_2", ["2018-04-15", "2018-04-22"]],
["Item_3", ["2016-02-15", "2016-02-22", "2016-03-05", "2016-04-01"]],
]
df = pd.DataFrame(data, columns=["Item_ID", "Dates"])
df
Upvotes: 2
Views: 393
Reputation: 26686
There are many ways
Option 1: Keep it numpy and one liner
df['Lapse'] =df.agg(lambda x: np.ptp(np.array(x['Dates'], dtype='datetime64')), axis=1)
Option 2: Go the long way
Explode
Coerce date to date time
Find differences of the extremes using np.ptpt
df=df.explode('Dates')
df['Dates']=pd.to_datetime(df['Dates'], format='%d,%m,%Y')
df.groupby('Item_ID').agg(lapse= ('Dates', np.ptp), Dates=('Dates', list))
Upvotes: 1
Reputation: 35686
We can Series.explode
the Dates
column, convert to_datetime
, then groupby agg
to find the min and max dates per group, take the diff
of each group, and assign the result back to a new column:
df['Duration'] = (
# explode lists into usable Series and convert to Datetime
pd.to_datetime(df['Dates'].explode())
.groupby(level=0).agg(['min', 'max']) # Get min and max per group
.diff(axis=1) # Diff across rows
.iloc[:, -1] # Get the resulting difference
)
If the lists are guaranteed to be sorted, we can simply subtract the last value in the list from the first to get the duration after converting to_datetime
:
df['Duration'] = (
# get last value in list and subtract from first value
# after converting each to datetime
pd.to_datetime(df['Dates'].str[-1]) - pd.to_datetime(df['Dates'].str[0])
)
Both options produce df
:
Item_ID Dates Duration
0 Item_1 [2020-06-01, 2020-06-02, 2020-07-05] 34 days
1 Item_2 [2018-04-15, 2018-04-22] 7 days
2 Item_3 [2016-02-15, 2016-02-22, 2016-03-05, 2016-04-01] 46 days
Upvotes: 3