Reputation: 4511
I'm having trouble alphabetizing the names in the first and second column of a dataframe.
The dataframe looks something like this:
Boys Females
Rank
1 Michael Jennifer
2 Christopher Jessica
3 Matthew Amanda
4 Jason Sarah
5 David Melissa
6 Joshua Amy
7 James Nicole
8 John Stephanie
9 Robert Elizabeth
10 Daniel Heather
11 Joseph Michelle
12 Justin Rebecca
13 Ryan Kimberly
14 Brian Tiffany
And I want it look like this: (The Boys and Females name are in alphabetical order)
Rank Boys Rank Females
14 Brian 3 Amanda
2 Christopher 6 Amy
10 Daniel 9 Elizabeth
5 David 10 Heather
7 James 1 Jennifer
I've played around with sort and sort_value but the columns are not changing. Here is my original code
import pandas as pd
df = pd.read_html("file:///C:/Python27/babyname999.html")
df2 =df[0] # creating a data frame from the above list of dateframes
df2.rename(columns = {'0': 'Rank', '1': 'Boys', '2': 'Females'}, inplace = True)
del df2['Unnamed: 0']
#renaming columns of dataframe
df2.set_index('Rank', inplace = True) #setting index of dataframe to 'Rank'
I've played around with sort and sort_value but the columns are not changing. I got no where. Any suggestions?
Thanks!
Upvotes: 3
Views: 1332
Reputation: 3930
Here's working example of sorting.
import pandas as pd
from io import StringIO
data_file = StringIO(u"""Rank Boys Females
1 Michael Jennifer
2 Christopher Jessica
3 Matthew Amanda
4 Jason Sarah
5 David Melissa
6 Joshua Amy
7 James Nicole
8 John Stephanie
9 Robert Elizabeth
10 Daniel Heather
11 Joseph Michelle
12 Justin Rebecca
13 Ryan Kimberly
14 Brian Tiffany""")
df = pd.read_table(data_file, delim_whitespace=True)
boys = df[['Rank','Boys']].sort_values(['Boys']).rename(columns={'Rank': 'Rank_boys'})
females = df[['Rank','Females']].sort_values(['Females']).rename(columns={'Rank': 'Rank_females'})
result = pd.concat([boys.reset_index(drop=True), females.reset_index(drop=True)], axis=1)
And result would be:
Rank_boys Boys Rank_females Females
0 14 Brian 3 Amanda
1 2 Christopher 6 Amy
2 10 Daniel 9 Elizabeth
3 5 David 10 Heather
4 7 James 1 Jennifer
5 4 Jason 2 Jessica
6 8 John 13 Kimberly
7 11 Joseph 5 Melissa
8 6 Joshua 11 Michelle
9 12 Justin 7 Nicole
10 3 Matthew 12 Rebecca
11 1 Michael 4 Sarah
12 9 Robert 8 Stephanie
13 13 Ryan 14 Tiffany
Upvotes: 3
Reputation: 294218
The problem with sorting different columns of a dataframe independently is that pandas will just use the index of each independently sorted column and realign them, thus destroying your sorting efforts. You must instead, sort and return the values of the resulting sorted series... enough talk, an example will explain much more.
Assume df
is your example dataframe. Then
df.apply(lambda x: x.sort_values().values)
To get something close to what you asked for
lst = [df[c].sort_values().reset_index(name='Name') for c in df]
keys = df.columns
pd.concat(lst, axis=1, keys=keys)
Upvotes: 2
Reputation: 210822
IIUC (it's hard as you didn't post expected/desired DF) you can do it this way:
df = (pd.read_html("file:///C:/Python27/babyname999.html")[0]
.rename(columns = {'0': 'Rank', '1': 'Boys', '2': 'Females'})
.drop('Unnamed: 0', 1)
.set_index('Rank')
)
then:
In [86]: df['Rank_Boys'], df['Rank_Females'] = df.sort_values('Boys').index, df.sort_values('Females').index
In [87]: df
Out[87]:
Boys Females Rank_Boys Rank_Females
1 Michael Jennifer 14 3
2 Christopher Jessica 2 6
3 Matthew Amanda 10 9
4 Jason Sarah 5 10
5 David Melissa 7 1
6 Joshua Amy 4 2
7 James Nicole 8 13
8 John Stephanie 11 5
9 Robert Elizabeth 6 11
10 Daniel Heather 12 7
11 Joseph Michelle 3 12
12 Justin Rebecca 1 4
13 Ryan Kimberly 9 8
14 Brian Tiffany 13 14
Upvotes: 2