Carlos Juvenal
Carlos Juvenal

Reputation: 17

Using Groupby and Apply - How to reorganize code

So i have this Df that i'm working on that listed some points of concentration listed in a column. like this

                 AVERAGE CONC   Data               Dia        Hora           Ponto
Data                    
2018-01-01 00:00:00 0.10205 2018-01-01 00:00:00 2018-01-01  00:00:00    510089.0/7770042.0
2018-01-01 00:00:00 0.27263 2018-01-01 00:00:00 2018-01-01  00:00:00    510589.0/7770042.0
2018-01-01 00:00:00 0.38072 2018-01-01 00:00:00 2018-01-01  00:00:00    511089.0/7770042.0
2018-01-01 00:00:00 0.53142 2018-01-01 00:00:00 2018-01-01  00:00:00    511589.0/7770042.0
2018-01-01 00:00:00 0.44083 2018-01-01 00:00:00 2018-01-01  00:00:00    512089.0/7770042.0
... ... ... ... ... ...
2020-12-31 23:00:00 0.00000 2020-12-31 23:00:00 2020-12-31  23:00:00    513089.0/7774542.0
2020-12-31 23:00:00 0.00000 2020-12-31 23:00:00 2020-12-31  23:00:00    513589.0/7774542.0
2020-12-31 23:00:00 0.00000 2020-12-31 23:00:00 2020-12-31  23:00:00    514089.0/7774542.0
2020-12-31 23:00:00 0.00000 2020-12-31 23:00:00 2020-12-31  23:00:00    514589.0/7774542.0
2020-12-31 23:00:00 0.00000 2020-12-31 23:00:00 2020-12-31  23:00:00    512339.0/7772292.0

i had to group all the concentration by Ponto, so i can calculate the correlation by data, so i used a tip i founded in another question that was made here and used groupby for it as below

df1 = df.groupby('Ponto')['AVERAGE CONC'].apply(lambda df: df.reset_index(drop=True)).unstack(0)

But now i'm dealing with a problem, in my new Df, where i transposed the column Ponto, he is organized by name order, and i don't want that, i want to maintain the natural order that of the first Df, this is because in project i use the last Point listed as a reference point for correlation.

In other ways the last Column in the new df1 must be the last listed in line Df, as it is the last column is the last listed as ascending sort.

512339.0/7772292.0

Upvotes: 1

Views: 70

Answers (1)

n1colas.m
n1colas.m

Reputation: 3989

You can use Pandas groupby property Groupby.indices which returns a dictionary with the group name and group indices ( in the order they are present in the original Dataframe). Then, you can sort this dictionary items by value and get only the sorted key from it. The result will be the original order of the unique values in Ponto used to groupby, and thus can be used to reorder the columns in the unstacked Dataframe (df_out = df1[new_columns_order]).

import pandas as pd

df = pd.read_csv('sample.csv', sep='\s+')

g = df.groupby('Ponto')
df1 = g['AVERAGE CONC'].apply(lambda df: df.reset_index(drop=True)).unstack(0)
df_out = df1[[i[0] for i in sorted(g.indices.items(), key=lambda x: x[1])]]

print(df_out.columns)

Output from df_out.columns

Index(['510089.0/7770042.0', '510589.0/7770042.0', '511089.0/7770042.0',
       '511589.0/7770042.0', '512089.0/7770042.0', '513089.0/7774542.0',
       '513589.0/7774542.0', '514089.0/7774542.0', '514589.0/7774542.0',
       '512339.0/7772292.0'],
      dtype='object', name='Ponto')

Upvotes: 1

Related Questions