Reputation: 17
So i have this Df that i'm working on that listed some points of concentration listed in a column. like this
AVERAGE CONC Data Dia Hora Ponto
Data
2018-01-01 00:00:00 0.10205 2018-01-01 00:00:00 2018-01-01 00:00:00 510089.0/7770042.0
2018-01-01 00:00:00 0.27263 2018-01-01 00:00:00 2018-01-01 00:00:00 510589.0/7770042.0
2018-01-01 00:00:00 0.38072 2018-01-01 00:00:00 2018-01-01 00:00:00 511089.0/7770042.0
2018-01-01 00:00:00 0.53142 2018-01-01 00:00:00 2018-01-01 00:00:00 511589.0/7770042.0
2018-01-01 00:00:00 0.44083 2018-01-01 00:00:00 2018-01-01 00:00:00 512089.0/7770042.0
... ... ... ... ... ...
2020-12-31 23:00:00 0.00000 2020-12-31 23:00:00 2020-12-31 23:00:00 513089.0/7774542.0
2020-12-31 23:00:00 0.00000 2020-12-31 23:00:00 2020-12-31 23:00:00 513589.0/7774542.0
2020-12-31 23:00:00 0.00000 2020-12-31 23:00:00 2020-12-31 23:00:00 514089.0/7774542.0
2020-12-31 23:00:00 0.00000 2020-12-31 23:00:00 2020-12-31 23:00:00 514589.0/7774542.0
2020-12-31 23:00:00 0.00000 2020-12-31 23:00:00 2020-12-31 23:00:00 512339.0/7772292.0
i had to group all the concentration by Ponto, so i can calculate the correlation by data, so i used a tip i founded in another question that was made here and used groupby for it as below
df1 = df.groupby('Ponto')['AVERAGE CONC'].apply(lambda df: df.reset_index(drop=True)).unstack(0)
But now i'm dealing with a problem, in my new Df, where i transposed the column Ponto, he is organized by name order, and i don't want that, i want to maintain the natural order that of the first Df, this is because in project i use the last Point listed as a reference point for correlation.
In other ways the last Column in the new df1 must be the last listed in line Df, as it is the last column is the last listed as ascending sort.
512339.0/7772292.0
Upvotes: 1
Views: 70
Reputation: 3989
You can use Pandas groupby property Groupby.indices which returns a dictionary with the group name and group indices ( in the order they are present in the original Dataframe). Then, you can sort this dictionary items by value and get only the sorted key from it. The result will be the original order of the unique values in Ponto
used to groupby
, and thus can be used to reorder the columns in the unstacked Dataframe (df_out = df1[new_columns_order]
).
import pandas as pd
df = pd.read_csv('sample.csv', sep='\s+')
g = df.groupby('Ponto')
df1 = g['AVERAGE CONC'].apply(lambda df: df.reset_index(drop=True)).unstack(0)
df_out = df1[[i[0] for i in sorted(g.indices.items(), key=lambda x: x[1])]]
print(df_out.columns)
Output from df_out.columns
Index(['510089.0/7770042.0', '510589.0/7770042.0', '511089.0/7770042.0',
'511589.0/7770042.0', '512089.0/7770042.0', '513089.0/7774542.0',
'513589.0/7774542.0', '514089.0/7774542.0', '514589.0/7774542.0',
'512339.0/7772292.0'],
dtype='object', name='Ponto')
Upvotes: 1