Reputation: 15
I have the following data in a pandas dataframe in Python. I would like to group the data based on the Name and Property column, and keep only the lowest value within the Date.
Before:
Name Property Date
0 X1 Cash 621 days 00:00:00.000000000
1 X1 Cash 256 days 00:00:00.000000000
2 X1 Cash 101 days 00:00:00.000000000
3 X1 Overflow 352 days 00:00:00.000000000
4 X1 Overflow 101 days 00:00:00.000000000
5 X1 Overflow 53 days 00:00:00.000000000
Repeat for X2/X3 etc.
After:
Name Property Date
2 X1 Cash 101 days 00:00:00.000000000
5 X1 Overflow 53 days 00:00:00.000000000
I am attempting to play around with the syntax but unable to get it right:
df1 = df1[df1.groupby(['Name', 'Property'])['Date'].min()]
Upvotes: 1
Views: 1241
Reputation:
If you call reset intex on the groupby.min() result, you'll have your desired output:
df.groupby(['Name', 'Property'])['Date'].min().reset_index()
Out:
Name Property Date
0 X1 Cash 101 days
1 X1 Overflow 53 days
You can also use nsmallest
:
df.groupby(['Name', 'Property'])['Date'].nsmallest(1)
Out:
Name Property
X1 Cash 2 101 days
Overflow 5 53 days
Name: Date, dtype: timedelta64[ns]
You can reset the index and drop the original index if you want:
df.groupby(['Name', 'Property'])['Date'].nsmallest(1).reset_index(level=[0, 1])
Out:
Name Property Date
2 X1 Cash 101 days
5 X1 Overflow 53 days
If you have additional columns you want to keep, you have several options.
df['additional column'] = np.arange(6)
df
Out:
Name Property Date additional column
0 X1 Cash 621 days 0
1 X1 Cash 256 days 1
2 X1 Cash 101 days 2
3 X1 Overflow 352 days 3
4 X1 Overflow 101 days 4
5 X1 Overflow 53 days 5
The first one, as @piRSquared mentioned, is to use loc
with the indexes of the returning rows:
df.loc[df.groupby(['Name', 'Property'])['Date'].idxmin()]
A second one would be to merge the original DataFrame and the aggregated one:
df.merge(df.groupby(['Name', 'Property'])['Date'].min().reset_index())
Both yield:
Name Property Date additional column
0 X1 Cash 101 days 2
1 X1 Overflow 53 days 5
Based on the use case, one may outperform the other.
Upvotes: 4
Reputation: 323326
You can try transform
in pandas
, also it will not remove your additional columns.
DF = df[df.groupby(['Property'])['Date'].transform(min) == df['Date']]
Name Property Date
2 X1 Cash 101 days
5 X1 Overflow 53 days
Upvotes: 1