everestial
everestial

Reputation: 7255

Pandas is treating integer values as strings while sorting? why?

I am trying to sort a pandas dataframe based on values from two columns. For some reason it is treating integers as strings, while several codes earlier those values were still integers. Not sure what caused the changes, but any way:

df = 

contig  pos ref haplotype_block hap_X   hap_Y   odds_ratio  My_hap  Sp_hap
2   5207    T   1856    T   A   167.922 T   A
2   5238    G   1856    C   G   -   C   G
2   5723    A   1856    A   T   -   A   T
2   5867    C   1856    T   C   -   T   C
2   155667  G   2816    G   *   1.0 N   N
2   155670  T   2816    T   *   -   N   N
2   67910   C   2   C   T   0.21600000000000003 T   C
2   67941   A   2   A   T   -   T   A
2   68016   A   2   A   G   -   G   A
2   118146  C   132 T   C   1369.0  T   C
2   118237  A   132 C   A   -   C   A
2   118938  A   1157    T   A   0.002   A   T


df.sort_values(by=['contig', 'pos'], inplace=True, ascending=False)

print(df) #is giving me


contig  pos ref haplotype_block hap_X   hap_Y   odds_ratio  My_hap  Sp_hap
2   118146  C   132 T   C   1369.0  T   C
2   118237  A   132 C   A   -   C   A
2   118938  A   1157    T   A   0.002   A   T
2   155667  G   2816    G   *   1.0 N   N
2   155670  T   2816    T   *   -   N   N
2   5207    T   1856    T   A   167.922 T   A
2   5238    G   1856    C   G   -   C   G
2   5723    A   1856    A   T   -   A   T
2   5867    C   1856    T   C   -   T   C
......

So, its only sorting the data using first digits of both columns (contig and pos). Why is this happening? and a very simple memory efficient way of solving it?

Thanks,

Post edit details:

print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 333 entries, 0 to 332
Data columns (total 9 columns):
contig             333 non-null int64
pos                333 non-null object
ref                333 non-null object
haplotype_block    333 non-null int64
hap_X              333 non-null object
hap_Y              333 non-null object
odds_ratio         333 non-null object
My_hap             333 non-null object
Sp_hap             333 non-null object
dtypes: int64(2), object(7)
memory usage: 23.5+ KB
None

Upvotes: 2

Views: 2998

Answers (1)

everestial
everestial

Reputation: 7255

convert the values to integer:

df['contig'] = df['contig'].astype(int)
df['pos'] = df['pos'].astype(int)

Then sort with inplace

df.sort_values(by=['contig', 'pos'], inplace=True, ascending=True)

Thanks,

Upvotes: 2

Related Questions