tandem
tandem

Reputation: 2238

Group data based on column name pandas

In the example below, I want to first sort based on UID and then the TSTAMP for each TID.

In this context, here is a minimal working example I generated:

df = pd.read_csv(dataset_path, names = ['TID','UID','TSTAMP'], delimiter=';')
df = df.sort_values(by=['TID'], ascending=[True])
print df
#print df.groupby('UID').describe()

However, this does not groupby('UID') the way want to sort it.

       TID   UID         TSTAMP
22267   77  (!?}  1494417075666
9095    77  U|X^  1494415815098
15266   77  ~Mb{  1494416401082
15263   77  ~Mb{  1494416401061
15255   77  Qh9~  1494416398799
15252   77  Qh9~  1494416398786
15239   77  xF#u  1494416397542
15236   77  xF#u  1494416397540
9105    77  U|X^  1494415815197

Something like this as the final result:

     TID      UID           TSTAMP
22267   77   (!?}   1494417075666
15263   77   ~Mb{   1494416401061
15266   77   ~Mb{   1494416401082
15252   77   Qh9~   1494416398786
15255   77   Qh9~   1494416398799
9095    77   U|X^   1494415815098
9105    77   U|X^   1494415815197
15236   77   xF#u   1494416397540
15239   77   xF#u   1494416397542

I'm a learning pandas.. any help will be appreciated.

Thanks to @jezrael, here is the final solution

df = pd.read_csv(dataset_path, names = ['TID','UID','TSTAMP'], delimiter=';')
df = df.sort_values(['TID', 'TSTAMP', 'UID'], ascending=[True, False, True])
d = {'min':'TSTAMP-INIT','max':'TSTAMP-FIN'}
df = df.groupby(['UID','TID'])['TSTAMP'].agg([min, max]).reset_index().rename(columns=d)

for i, row in df.T.iteritems():
    print row

Upvotes: 0

Views: 35

Answers (1)

jezrael
jezrael

Reputation: 862511

It seems you need sort_values:

df = df.sort_values(['TID', 'TSTAMP', 'UID'], ascending=[True, False, True])
print (df)
       TID   UID         TSTAMP
22267   77  (!?}  1494417075666
15266   77  ~Mb{  1494416401082
15263   77  ~Mb{  1494416401061
15255   77  Qh9~  1494416398799
15252   77  Qh9~  1494416398786
15239   77  xF#u  1494416397542
15236   77  xF#u  1494416397540
9105    77  U|X^  1494415815197
9095    77  U|X^  1494415815098

If first column is not necessary sort, omit it:

df = df.sort_values(['TSTAMP', 'UID'], ascending=[False, True])
print (df)
       TID   UID         TSTAMP
22267   77  (!?}  1494417075666
15266   77  ~Mb{  1494416401082
15263   77  ~Mb{  1494416401061
15255   77  Qh9~  1494416398799
15252   77  Qh9~  1494416398786
15239   77  xF#u  1494416397542
15236   77  xF#u  1494416397540
9105    77  U|X^  1494415815197
9095    77  U|X^  1494415815098

Upvotes: 2

Related Questions