heartbit
heartbit

Reputation: 117

Pandas dataframe not properly sorted

I have several excel data files, each one referring to a different timing (i.e. 0h, 24h, 48h, ...), and the columns with the data of interest are named: 'Product' and 'Value'. I have concatenated those files by using the following:

result = pd.concat([pd.read_excel(file) for file in filenames], keys=t_list, names=['t'])

where filenames is a list containing the excel files, t_list is a list containing the timings, 't' is the name of the newly created column with the timings. So far so good, and I get a new dataframe with the following structure:

Concatenated Dataframe:
concatenated dataframe

But then, if I sort with the following:

result['Product'].astype(str)
result.sort_values('Product', ascending=True)

I found the correct order for some items, but not for others. Indeed, I get something like the following:

Dataframe not properly sorted:
dataframe not properly sorted

Can someone shed a light on this?

Upvotes: 0

Views: 548

Answers (2)

Daweo
Daweo

Reputation: 36370

Can someone shed a light on this?

Default sorting algorithm of sort_values is not stable, therefore order of elements with equal Product might be different than before sorting. To avoid that you might elect to use kind="mergesort" in sort_values.

Upvotes: 1

yakir0
yakir0

Reputation: 184

Instead of passing 'Products', pass a list of the columns you want to sort by.

If you want to sort by 'Product' and then by 'Value', it looks like this:

result.sort_values(['Product', 'Value'], ascending=True)

Upvotes: 0

Related Questions