Reputation: 130
Hi everyone I have tried to look everywhere for this issue but I cannot figure a solution out. I'd be glad if you'd help me.
So, basically I have this dataset:
df = pd.DataFrame({"col1": ['xxx', 'xxx', 'xxx', 'kkk', 'www', 'www'],
"col2": [ 2020, 1994, 2013, 1000, 1996, 2021]})
df.dtypes
col1 object
col2 int64
dtype: object
and I want to order the first column with a costum order and the second column with ascending order. The final result should be the following:
col1 col2
4 www 1996
5 www 2021
3 kkk 1000
0 xxx 1994
1 xxx 2013
2 xxx 2020
So, in order to accomplish that I do this:
d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} # to customize order
df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(d))
but I end up with this:
col1 col2
4 www 1996
5 www 2021
3 kkk 1000
0 xxx 2020
1 xxx 1994
2 xxx 2013
If I only do:
df.sort_values(by = ['col1' , 'col2'])
col1 col2
3 kkk 1000
4 www 1996
5 www 2021
1 xxx 1994
2 xxx 2013
0 xxx 2020
The col2 is ordered fine. I really don't know why I am having this issue. Has anyone experienced something similar? Thanks
Upvotes: 5
Views: 584
Reputation: 491
create new column, map it to your dictionary, sort by it, then delete :)
df = pd.DataFrame({"col1": ['xxx', 'xxx', 'xxx', 'kkk', 'www', 'www'],
"col2": [ 2020, 1994, 2013, 1000, 1996, 2021]})
d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} # to customize order
df['sorting'] = df['col1'].map(d)
df = df.sort_values('sorting')
del df['sorting']
Upvotes: 1
Reputation: 863156
Possible trick is expand dictionary by values from col2
:
d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} # to customize order
d = {**d, **dict(zip(df.col2, df.col2))}
df = df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(d))
print (df)
col1 col2
4 www 1996
5 www 2021
3 kkk 1000
1 xxx 1994
2 xxx 2013
0 xxx 2020
Or solution with get
- if no match is returned same value instead NaN
:
df = df.sort_values(by = ['col1' , 'col2'], key = lambda x: x.map(lambda y: d.get(y, y)))
print (df)
col1 col2
4 www 1996
5 www 2021
3 kkk 1000
1 xxx 1994
2 xxx 2013
0 xxx 2020
Solution with helper column:
d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0}
df = df.assign(new=df['col1'].map(d)).sort_values(by=['new','col2']).drop('new', axis=1)
print (df)
col1 col2
4 www 1996
5 www 2021
3 kkk 1000
1 xxx 1994
2 xxx 2013
0 xxx 2020
Upvotes: 1
Reputation: 7913
Well, it's not a bug. What do you expect from key when passing it to multiple columns? It will be applied to each of columns and used as sorting key. You don't have any of col2 elements in d, so it returns None (I think so because no error is raised, thus it may use d.get(x, None)
). For your use case the most simple solution will be sorting twice:
df.sort_values(by='col2').sort_values(by='col1', key=lambda x: x.map(d))
I have also checked whether my assumptions were correct. If you set key=x.map(lambda t: d[t])
instead of x.map(d)
in your code example, error is raised saying that key is unmatched. So you really try to apply it to both columns and sort by results.
Upvotes: 1
Reputation: 9207
Pandas has a datatype "categorical" which can do that:
import pandas as pd
df = pd.DataFrame({"col1": ['xxx', 'xxx', 'xxx', 'kkk', 'www', 'www'],
"col2": [ 2020, 1994, 2013, 1000, 1996, 2021]})
df["col1"] = pd.Categorical(df["col1"], ordered=True,
categories=['www', 'zzz', 'kkk', 'jjj', 'xxx', 'yyy'])
df.sort_values(["col1", "col2"], ascending = (True, True))
col1 col2
4 www 1996
5 www 2021
3 kkk 1000
1 xxx 1994
2 xxx 2013
0 xxx 2020
Upvotes: 1
Reputation: 31206
Change col1 to be a categorical where it is ordered.
df = pd.DataFrame({"col1": ['xxx', 'xxx', 'xxx', 'kkk', 'www', 'www'],
"col2": [ 2020, 1994, 2013, 1000, 1996, 2021]})
d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0} # to customize order
df["col1"] = pd.Categorical(df["col1"], sorted(list(d.keys()), key=lambda x: d[x]), ordered=True)
df.sort_values(["col1","col2"])
col1 | col2 | |
---|---|---|
4 | www | 1996 |
5 | www | 2021 |
3 | kkk | 1000 |
1 | xxx | 1994 |
2 | xxx | 2013 |
0 | xxx | 2020 |
Upvotes: 1
Reputation: 14949
Try this. -
df =df.groupby('col1')['col2'].apply(sorted).explode().reset_index()
d = {'xxx': 4, 'zzz':1, 'yyy':5, 'kkk':2, 'jjj':3, 'www':0}
df['sorted'] = df['col1'].map(d)
df = df.sort_values('sorted').drop('sorted', axis=1)
Upvotes: 1