Moondra
Moondra

Reputation: 4511

Sort/Alphabetzing Columns in Dataframes (Pandas)

I'm having trouble alphabetizing the names in the first and second column of a dataframe.

The dataframe looks something like this:

          Boys       Females
Rank                        
1         Michael   Jennifer
2     Christopher    Jessica
3         Matthew     Amanda
4           Jason      Sarah
5           David    Melissa
6          Joshua        Amy
7           James     Nicole
8            John  Stephanie
9          Robert  Elizabeth
10         Daniel    Heather
11         Joseph   Michelle
12         Justin    Rebecca
13           Ryan   Kimberly
14          Brian    Tiffany

And I want it look like this: (The Boys and Females name are in alphabetical order)

 Rank     Boys                 Rank    Females
 14       Brian                  3        Amanda
  2       Christopher            6        Amy
 10       Daniel                 9        Elizabeth
  5       David                 10        Heather
  7       James                  1        Jennifer

I've played around with sort and sort_value but the columns are not changing. Here is my original code

import pandas as pd



df = pd.read_html("file:///C:/Python27/babyname999.html")

df2 =df[0]  # creating a data frame from the above list of dateframes

df2.rename(columns = {'0': 'Rank', '1': 'Boys', '2': 'Females'}, inplace = True)
del df2['Unnamed: 0']

#renaming columns of dataframe


df2.set_index('Rank', inplace = True)  #setting index of dataframe to 'Rank'

I've played around with sort and sort_value but the columns are not changing. I got no where. Any suggestions?

Thanks!

Upvotes: 3

Views: 1332

Answers (3)

milos.ai
milos.ai

Reputation: 3930

Here's working example of sorting.

import pandas as pd
from io import StringIO

data_file = StringIO(u"""Rank       Boys       Females
1         Michael   Jennifer
2     Christopher    Jessica
3         Matthew     Amanda
4           Jason      Sarah
5           David    Melissa
6          Joshua        Amy
7           James     Nicole
8            John  Stephanie
9          Robert  Elizabeth
10         Daniel    Heather
11         Joseph   Michelle
12         Justin    Rebecca
13           Ryan   Kimberly
14          Brian    Tiffany""")

df = pd.read_table(data_file, delim_whitespace=True)

boys = df[['Rank','Boys']].sort_values(['Boys']).rename(columns={'Rank': 'Rank_boys'})
females = df[['Rank','Females']].sort_values(['Females']).rename(columns={'Rank': 'Rank_females'})
result = pd.concat([boys.reset_index(drop=True), females.reset_index(drop=True)], axis=1)

And result would be:

Rank_boys Boys Rank_females Females
0   14  Brian   3   Amanda
1   2   Christopher 6   Amy
2   10  Daniel  9   Elizabeth
3   5   David   10  Heather
4   7   James   1   Jennifer
5   4   Jason   2   Jessica
6   8   John    13  Kimberly
7   11  Joseph  5   Melissa
8   6   Joshua  11  Michelle
9   12  Justin  7   Nicole
10  3   Matthew 12  Rebecca
11  1   Michael 4   Sarah
12  9   Robert  8   Stephanie
13  13  Ryan    14  Tiffany

Upvotes: 3

piRSquared
piRSquared

Reputation: 294218

The problem with sorting different columns of a dataframe independently is that pandas will just use the index of each independently sorted column and realign them, thus destroying your sorting efforts. You must instead, sort and return the values of the resulting sorted series... enough talk, an example will explain much more.

Assume df is your example dataframe. Then

df.apply(lambda x: x.sort_values().values)

enter image description here


To get something close to what you asked for

lst = [df[c].sort_values().reset_index(name='Name') for c in df]
keys = df.columns
pd.concat(lst, axis=1, keys=keys)

enter image description here

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210822

IIUC (it's hard as you didn't post expected/desired DF) you can do it this way:

df = (pd.read_html("file:///C:/Python27/babyname999.html")[0]
        .rename(columns = {'0': 'Rank', '1': 'Boys', '2': 'Females'})
        .drop('Unnamed: 0', 1)
        .set_index('Rank')
)

then:

In [86]: df['Rank_Boys'], df['Rank_Females'] = df.sort_values('Boys').index, df.sort_values('Females').index

In [87]: df
Out[87]:
           Boys    Females  Rank_Boys  Rank_Females
1       Michael   Jennifer         14             3
2   Christopher    Jessica          2             6
3       Matthew     Amanda         10             9
4         Jason      Sarah          5            10
5         David    Melissa          7             1
6        Joshua        Amy          4             2
7         James     Nicole          8            13
8          John  Stephanie         11             5
9        Robert  Elizabeth          6            11
10       Daniel    Heather         12             7
11       Joseph   Michelle          3            12
12       Justin    Rebecca          1             4
13         Ryan   Kimberly          9             8
14        Brian    Tiffany         13            14

Upvotes: 2

Related Questions