罗文浩
罗文浩

Reputation: 79

Select the last value in time after multiple groupings

I want to group ‘name’ first, then press ‘day’ to aggregate and select the last value of each ‘name’ every day.

I got some ideas from here:pandas - how to organised dataframe based on date and assign new values to column

I tried this, but I can't succeed. Is there any good way?

df = df.groupby(df['name']).resample('D',on='Timestamp').apply(['last'])

eg:

import pandas as pd

N = 9
rng = pd.date_range('2011-01-01', periods=N, freq='15S')
df = pd.DataFrame({'Timestamp': rng, 'name': ['A','A', 'B','B','B','B','C','C','C'],
                  'value': [1, 2, 3, 2, 3, 1, 3, 4, 3],'Temp': range(N)}) 
[out]:
    Timestamp           name    value   Temp
0   2011-01-01 00:00:00   A     1       0
1   2011-01-01 00:00:15   A     2       1
2   2011-01-01 00:00:30   B     3       2
3   2011-01-01 00:00:45   B     2       3
4   2011-01-01 00:01:00   B     3       4
5   2011-01-01 00:01:15   B     1       5
6   2011-01-01 00:01:30   C     3       6
7   2011-01-01 00:01:45   C     4       7
8   2011-01-01 00:02:00   C     3       8

I want to get these:

[out]:
           Timestamp    name    value   Temp
1   2011-01-01 00:00:15   A     2       1
5   2011-01-01 00:01:15   B     1       5
8   2011-01-01 00:02:00   C     3       8

Upvotes: 3

Views: 222

Answers (2)

jezrael
jezrael

Reputation: 862841

If need last values per days and per column name, use GroupBy.tail with Grouper:

df1 = df.groupby([pd.Grouper(freq='D', key='Timestamp'), 'name']).tail(1)
print (df1)
            Timestamp name  value  Temp
1 2011-01-01 00:00:15    A      2     1
5 2011-01-01 00:01:15    B      1     5
8 2011-01-01 00:02:00    C      3     8

Or convert values of Timestamp to dates by Series.dt.date:

df2 = df.groupby([df['Timestamp'].dt.date, 'name']).tail(1)
print (df2)
            Timestamp name  value  Temp
1 2011-01-01 00:00:15    A      2     1
5 2011-01-01 00:01:15    B      1     5
8 2011-01-01 00:02:00    C      3     8

There are also alternatives with Series.dt.normalize:

df2 = df.groupby([df['Timestamp'].dt.normalize(), 'name']).tail(1)

Or Series.dt.floor:

df2 = df.groupby([df['Timestamp'].dt.floor('D'), 'name']).tail(1)

Upvotes: 1

BENY
BENY

Reputation: 323306

IIUC

df.groupby('name').tail(1)
Out[25]: 
   Temp           Timestamp name  value
1     1 2011-01-01 00:00:15    A      2
5     5 2011-01-01 00:01:15    B      1
8     8 2011-01-01 00:02:00    C      3

Or

df.drop_duplicates('name',keep='last')
Out[26]: 
   Temp           Timestamp name  value
1     1 2011-01-01 00:00:15    A      2
5     5 2011-01-01 00:01:15    B      1
8     8 2011-01-01 00:02:00    C      3

Upvotes: 2

Related Questions