Reputation: 23
I have the following code that is supposed to sort a dataframe called "coint_results"
results = np.array([combinations[pair][0], combinations[pair][1], a, b, egtest[0]])
coint_results = coint_results.append(pd.DataFrame(results).T)
coint_results = coint_results.reset_index().dropna().drop("index", axis=1)
coint_results.columns = ['stock_1', 'stock_2', 'a', 'b', 't-stat']
coint_results.sort_values(by=['t-stat'], ascending=True, inplace=True, ignore_index=True)
print(coint_results)
I thought it should worked well, but here the result
stock_1 stock_2 ... b t-stat
0 EXO.MI TIT.MI ... 0.03767853882823848 -10.03626667966398
1 BAMI.MI TIT.MI ... 0.002448683245146175 -10.054806761275424
2 STM.MI TIT.MI ... -0.002184095752726706 -10.068964759967473
3 SRG.MI TIT.MI ... 0.4478998680510796 -10.115909021808724
4 SPM.MI TIT.MI ... 0.06460752028453558 -10.354583264537016
5 ENEL.MI TIT.MI ... 0.11502811550372381 -11.004160730546342
6 SRG.MI UNI.MI ... 11.243141736537956 -3.906435255815229
7 IP.MI STM.MI ... -4.73690752582681 -3.9585359373191573
8 STLA.MI STM.MI ... 5.160916664191976 -4.121749682409465
9 BAMI.MI STM.MI ... 0.5317077958371442 -4.311274260473334
10 TIT.MI UCG.MI ... 6.132134120131706 -4.475906614947818
11 ISP.MI STM.MI ... 0.6185417904201413 -4.506308325127962
12 BZU.MI UCG.MI ... -4.5712728593642025 -5.0446571567547585
13 G.MI UCG.MI ... 5.2911521693982175 -5.1304167685585265
14 SRG.MI UCG.MI ... -17.988827453255993 -5.1784094787560635
15 BMED.MI UCG.MI ... 9.924937184752496 -5.406007568494866
16 IP.MI UCG.MI ... -10.742575715380323 -5.54071721843284
17 STM.MI UCG.MI ... 2.3788486425865387 -5.620706678454393
18 ISP.MI UCG.MI ... 8.092421011705971 -6.016276738559553
19 STLA.MI UCG.MI ... 21.6916172567256 -6.176169724335034
20 BAMI.MI UCG.MI ... 1.5867053473179111 -6.749604800157616
21 G.MI TIT.MI ... 0.2427544870802295 -9.048408747110315
22 IP.MI TIT.MI ... 0.03859781538585183 -9.87594779908127
23 ISP.MI TIT.MI ... 0.017164281671812998 -9.887616220510456
24 BMED.MI TIT.MI ... -0.04082879951061795 -9.8910162081357
25 STLA.MI TIT.MI ... 0.09583782780784517 -9.910261629446035
26 MB.MI TIT.MI ... 0.1551557504463611 -9.91316985348034
27 BZU.MI TIT.MI ... 0.03270215832874518 -9.944786934999167
Apparently Pandas sorts value in a descending order, but the first 6 values should go at the bottom of the dataframe. It doesn't make any sense to me.
Could someone help me to figure it out?
Upvotes: 1
Views: 412
Reputation: 35626
That is lexicographic ordering. For some reason, t-stat
is of type object.
Convert to_numeric
before sorting:
coint_results['t-stat'] = pd.to_numeric(coint_results['t-stat'])
coint_results.sort_values(by=['t-stat'], ascending=True,
inplace=True, ignore_index=True)
Default display behaviour will hide some values can set_option
display precision to see more:
pd.set_option('display.precision', 15)
print(coint_results)
coint_results
:
t-stat
0 -11.004160730546342
1 -10.354583264537016
2 -10.115909021808724
3 -10.068964759967473
4 -10.054806761275424
5 -10.036266679663980
6 -9.944786934999167
7 -9.913169853480341
8 -9.910261629446035
9 -9.891016208135699
10 -9.887616220510456
11 -9.875947799081271
12 -9.048408747110315
13 -6.749604800157616
14 -6.176169724335034
15 -6.016276738559553
16 -5.620706678454393
17 -5.540717218432840
18 -5.406007568494866
19 -5.178409478756063
20 -5.130416768558526
21 -5.044657156754758
22 -4.506308325127962
23 -4.475906614947818
24 -4.311274260473334
25 -4.121749682409465
26 -3.958535937319157
27 -3.906435255815229
loc
can be used to select indexes based on the sorted column:
coint_results = (
coint_results.loc[
pd.to_numeric(coint_results['t-stat'])
.sort_values(ascending=True).index
].reset_index(drop=True)
)
Or, a temporary column can bed created to sort by, then drop
from the DataFrame:
coint_results['sortby'] = pd.to_numeric(coint_results['t-stat'])
coint_results.sort_values(by='sortby', ascending=True, inplace=True,
ignore_index=True)
coint_results.drop('sortby', axis=1, inplace=True)
print(coint_results)
Both of these options create a separate Series to sort with so no change in the values in the t-stat
column.
coint_results
:
t-stat
0 -11.004160730546342
1 -10.354583264537016
2 -10.115909021808724
3 -10.068964759967473
4 -10.054806761275424
5 -10.03626667966398
6 -9.944786934999167
7 -9.91316985348034
8 -9.910261629446035
9 -9.8910162081357
10 -9.887616220510456
11 -9.87594779908127
12 -9.048408747110315
13 -6.749604800157616
14 -6.176169724335034
15 -6.016276738559553
16 -5.620706678454393
17 -5.54071721843284
18 -5.406007568494866
19 -5.1784094787560635
20 -5.1304167685585265
21 -5.0446571567547585
22 -4.506308325127962
23 -4.475906614947818
24 -4.311274260473334
25 -4.121749682409465
26 -3.9585359373191573
27 -3.906435255815229
DataFrame Constructor:
coint_results = pd.DataFrame({
't-stat': ['-10.03626667966398', '-10.054806761275424',
'-10.068964759967473', '-10.115909021808724',
'-10.354583264537016', '-11.004160730546342',
'-3.906435255815229', '-3.9585359373191573',
'-4.121749682409465', '-4.311274260473334', '-4.475906614947818',
'-4.506308325127962', '-5.0446571567547585',
'-5.1304167685585265', '-5.1784094787560635',
'-5.406007568494866', '-5.54071721843284', '-5.620706678454393',
'-6.016276738559553', '-6.176169724335034', '-6.749604800157616',
'-9.048408747110315', '-9.87594779908127', '-9.887616220510456',
'-9.8910162081357', '-9.910261629446035', '-9.91316985348034',
'-9.944786934999167']
})
Upvotes: 4