Reputation: 1
I am trying to sort a Pandas dataframe by a numeric column('Firstgift'), and it appears to be sorting the values by the leading characters from the left:
ID # Firstgift Firstgiftdate Lastgift Lastgiftdate
180 25,942,055.00 93,000.00 3/27/2015 93,000.00 3/27/2015
237 25,972,246.00 9,921.26 12/8/2014 9,921.26 12/8/2014
112 25,836,557.00 9,565.63 12/11/2014 9,565.63 12/11/2014
49 21,221,574.00 9,340.57 5/27/2015 1,154.00 7/2/2015
0 20,251,509.00 9,304.58 4/21/2015 9,304.58 4/21/2015
6 20,780,436.00 8,149.00 5/20/2015 8,149.00 5/20/2015
430 26,011,859.00 8,000.00 12/28/2014 8,000.00 12/28/2014
377 26,004,400.00 8,000.00 12/31/2014 100.00 4/28/2015
227 25,969,658.00 75,000.00 2/6/2015 75,000.00 2/6/2015
478 26,031,770.00 70,000.00 2/9/2015 70,000.00 2/9/2015
617 26,100,302.00 7,500.00 4/29/2015 7,500.00 4/29/2015
677 26,108,994.00 7,500.00 5/4/2015 7,500.00 5/4/2015
56 21,306,073.00 7,469.08 6/16/2015 7,469.08 6/16/2015
7 20,780,563.00 7,342.48 5/19/2015 7,342.48 5/19/2015
Code was:
import pandas as pd
import sklearn
pd.set_option('display.expand_frame_repr',True)
raw = pd.read_table('MG_FG_TEST.txt',sep="\t")
Firstgift = raw.sort('Firstgift', ascending=False)
What am I missing?
Upvotes: 0
Views: 190
Reputation: 4605
I think you need to use the thousands
argument so that pandas reads those numbers in as floats rather than strings:
raw = pd.read_table('MG_FG_TEST.txt',sep="\t",thousands=',')
Upvotes: 1