Reputation: 158
The dataframe (df) looks like:
Date Caller Called
0 2011-01-01 00:00:00 Sarah Claire
1 2011-01-01 00:00:00 Sarah Ryan
2 2011-01-01 00:00:00 Sarah Alex
3 2011-01-02 00:00:00 Sarah Max
4 2011-01-02 00:00:00 Sarah Phoebe
number_date = df.groupby(['Caller',pd.DatetimeIndex(df['Date']).normalize()])
number_date['Called'].unique().apply(lambda x: len(x))
This does what I expect. She made 3 calls on the 1st and 2 on the second:
Caller
Sarah 2011-01-01 3
2011-01-02 2
How can I modify this to be:
Caller MaxCallsOneDay
Sarah 3
Upvotes: 3
Views: 763
Reputation: 109546
Group again on your existing dataframe, using level=[0]
to specify the the first level in the index for grouping (ie. the Caller):
# existing dataframe
df = pd.DataFrame(
{'Called': ['Claire', 'Ryan', 'Ryan', 'Ryan1', 'Ryan', 'Alex', 'Max'],
'Caller': ['Sarah', 'Sarah', 'John', 'John', 'John', 'Sarah', 'Sarah'],
'Date': ['2011-01-01'] * 6 + ['2011-01-02']})
# Group on dataframe.
gb = df.groupby(['Caller', pd.DatetimeIndex(df['Date']).normalize()]).Called.count()
>>> gb.groupby(level=[0]).max()
Caller
John 3
Sarah 3
Name: Called, dtype: int64
Upvotes: 0
Reputation: 862661
You can try nunique
and nlargest
:
print df
Date Caller Called
0 2011-01-01 Sarah Claire
1 2011-01-01 Sarah Ryan
2 2011-01-01 Sarah1 Ryan
3 2011-01-01 Sarah1 Ryan1
4 2011-01-01 Sarah1 Ryan
5 2011-01-01 Sarah Alex
6 2011-01-02 Sarah Max
number_date=df.groupby(['Caller',pd.DatetimeIndex(df['Date']).normalize()])['Called'].nunique()
.groupby(level=0).nlargest(1).reset_index(level=1, drop=True).reset_index()
number_date.columns = ['Caller','Date', 'MaxCallsOneDay']
print number_date
Caller Date MaxCallsOneDay
0 Sarah 2011-01-01 3
1 Sarah1 2011-01-01 2
If you dont need column Date
, use max
:
df = df.groupby(['Caller',pd.DatetimeIndex(df['Date']).normalize()])['Called'].nunique()
.groupby(level=0).max().reset_index(name='MaxCallsOneDay')
print df
Caller MaxCallsOneDay
0 Sarah 3
1 Sarah1 2
Timings - len(df) = 7k
:
In [531]: %timeit df.groupby(['Caller',pd.DatetimeIndex(df['Date']).normalize()])['Called'].unique().apply(lambda x: len(x)).groupby(level=0).nlargest(1).reset_index(level=1, drop=True).reset_index()
The slowest run took 4.80 times longer than the fastest. This could mean that an intermediate result is being cached
10 loops, best of 3: 8.58 ms per loop
In [532]: %timeit df.groupby(['Caller',pd.DatetimeIndex(df['Date']).normalize()])['Called'].nunique().groupby(level=0).nlargest(1).reset_index(level=1, drop=True).reset_index()
100 loops, best of 3: 7.07 ms per loop
In [547]: %timeit df.groupby(['Caller',pd.DatetimeIndex(df['Date']).normalize()])['Called'].nunique().groupby(level=0).max().reset_index(name='MaxCallsOneDay')
100 loops, best of 3: 3.52 ms per loop
Upvotes: 1