Reputation: 23
I'm trying to group by a column, find the minimum date value in that group and insert it in a new column for all values in that group.
The following:
d = {'one' : pd.Series(np.random.randn(6), index=pd.date_range('1/1/2011', periods=6, freq='H')),
'two' : pd.Series(["A", "B", "C"] * 2, index=pd.date_range('1/1/2011', periods=6, freq='H'))}
df = pd.DataFrame(d)
df['ts'] = df.index
df['min_date'] = df.groupby('two')['ts'].min()
df
gives me this output:
Out[7]:
one two ts min_date
2011-01-01 00:00:00 1.676829 A 2011-01-01 00:00:00 NaT
2011-01-01 01:00:00 -0.490976 B 2011-01-01 01:00:00 NaT
2011-01-01 02:00:00 -1.934902 C 2011-01-01 02:00:00 NaT
2011-01-01 03:00:00 -0.625931 A 2011-01-01 03:00:00 NaT
2011-01-01 04:00:00 1.534645 B 2011-01-01 04:00:00 NaT
2011-01-01 05:00:00 0.123045 C 2011-01-01 05:00:00 NaT
[6 rows x 4 columns]
Where I would like to have:
Out[7]:
one two ts min_date
2011-01-01 00:00:00 1.676829 A 2011-01-01 00:00:00 2011-01-01 00:00:00
2011-01-01 01:00:00 -0.490976 B 2011-01-01 01:00:00 2011-01-01 01:00:00
2011-01-01 02:00:00 -1.934902 C 2011-01-01 02:00:00 2011-01-01 02:00:00
2011-01-01 03:00:00 -0.625931 A 2011-01-01 03:00:00 2011-01-01 00:00:00
2011-01-01 04:00:00 1.534645 B 2011-01-01 04:00:00 2011-01-01 01:00:00
2011-01-01 05:00:00 0.123045 C 2011-01-01 05:00:00 2011-01-01 02:00:00
[6 rows x 4 columns]
Grouping on column two
so the first occurence of A is set in min_date for all A entries, etc.
Upvotes: 2
Views: 272
Reputation: 353119
I think you want the transform
method:
>>> df['min_date'] = df.groupby('two')['ts'].transform("min")
>>> df
one two ts min_date
2011-01-01 00:00:00 0.574285 A 2011-01-01 00:00:00 2011-01-01 00:00:00
2011-01-01 01:00:00 -0.200439 B 2011-01-01 01:00:00 2011-01-01 01:00:00
2011-01-01 02:00:00 0.549725 C 2011-01-01 02:00:00 2011-01-01 02:00:00
2011-01-01 03:00:00 1.187299 A 2011-01-01 03:00:00 2011-01-01 00:00:00
2011-01-01 04:00:00 0.770180 B 2011-01-01 04:00:00 2011-01-01 01:00:00
2011-01-01 05:00:00 -0.448781 C 2011-01-01 05:00:00 2011-01-01 02:00:00
[6 rows x 4 columns]
It's for performing an aggregation operation and then broadcasting the result up to the entire group.
Upvotes: 2