Reputation: 585
I have a dataframe like this:
Produtos Estoque total Valor Total de estoque
0 70 10000 7180
1 70 2800000 2011550
2 70 125000 89800
3 71 540000 530980
4 71 89000 79280
5 84 205000 572770
... ... ... ...
14988 1003254 46000 1329400
14989 1003273 30570000 5502600
14990 1003274 62000000 3720000
14991 1003275 200000000 3840000
14992 1003276 710000 2108700
14993 1003279 6750000 715330
I am trying to sort the 'Produtos' column taking in consideration the first digit first (and if equality, the second, and so on), like:
100
1001
1002
10003
10004
100000
200
2001
20002
I found that I should use this command:
line.sort(key=lambda line: int(line.split()[0]))
But I am having difficult to use in a correct way.
Upvotes: 2
Views: 630
Reputation: 1127
As usual, answer from @jezrael is detailed and superb. But I found another option available and I think it's worth sharing:
We can just use key
arg in sort_values:
df.sort_values(by = ['Estoque'], key = lambda x:x.astype(str).str[0])
(Sorry I couldn't get the columns lined up with pd.read_clipboard())
The result
Upvotes: 0
Reputation: 863301
First get values of first values by indexing by str, get positions by argsort
and last reorder by iloc
:
df = df.iloc[df['Produtos'].astype(str).str[0].argsort()]
print (df)
Produtos Estoque total Valor Total de estoque
14988 1003254 46000 1329400
14989 1003273 30570000 5502600
14990 1003274 62000000 3720000
14991 1003275 200000000 3840000
14992 1003276 710000 2108700
14993 1003279 6750000 715330
0 70 10000 7180
1 70 2800000 2011550
2 70 125000 89800
3 71 540000 530980
4 71 89000 79280
5 84 205000 572770
EDIT: Sorting by first value and also length is possible this trick - select by indices by helper DataFrame
by first value with len
:
print (df)
Produtos Estoque total Valor Total de estoque
0 70 10000 7180.0
1 70 2800000 2011550.0
2 71 125000 89800.0
3 710 540000 530980.0
4 7100 89000 79280.0
5 84 205000 572770.0
14988 10032546000 1329400 NaN
14989 10032 30570000 5502600.0
14990 1003 62000000 3720000.0
14991 100 200000000 3840000.0
14992 10 710000 2108700.0
14993 1003279 6750000 715330.0
s = df['Produtos'].astype(str)
i = pd.DataFrame(np.c_[s.str[0].astype(int), s.str.len()]).sort_values([0,1]).index
print (i)
Int64Index([10, 9, 8, 7, 11, 6, 0, 1, 2, 3, 4, 5], dtype='int64')
df = df.iloc[i]
print (df)
Produtos Estoque total Valor Total de estoque
14992 10 710000 2108700.0
14991 100 200000000 3840000.0
14990 1003 62000000 3720000.0
14989 10032 30570000 5502600.0
14993 1003279 6750000 715330.0
14988 10032546000 1329400 NaN
0 70 10000 7180.0
1 70 2800000 2011550.0
2 71 125000 89800.0
3 710 540000 530980.0
4 7100 89000 79280.0
5 84 205000 572770.0
Upvotes: 3