Reputation: 147
Despite my search through this forum and other sources I still have no idea how to fix this problem.
Explain: I was running a script(below) for downloading prices for all stocks in FTSE MIB 40 in order to try the best cointegrated pairs. Unfortunately it seems that, when running the script (already working for other markets) it reports error for nans
or infinite values. I tried to use dropna but it still has problem. Here my whole code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels
from statsmodels.tsa.stattools import coint
from statsmodels.tsa.stattools import adfuller
import yfinance as yf
import pandas_datareader as pdr
import seaborn
ftse_mib40=['A2A.MI','AMP.MI','ATL.MI','AZM.MI','BAMI.MI','BPE.MI','BZU.MI','CPR.MI','CNHI.MI','DIA.MI','ENEL.MI','ENI.MI','EXO.MI','RACE.MI','FCA.MI','FBK.MI','G.MI','HER.MI','ISP.MI','IG.MI','JUVE.MI','LDO.MI','MB.MI','MONC.MI','NEXI.MI','NXEN','PIRC.MI','PST.MI','PRY.MI','REC.MI','SPM.MI','SFER.MI','SRG.MI','STM.MI','TIT.MI','TEN.MI','TRN.MI','UBI.MI','UCG.MI','UNI.MI','US.MI']
ftse_yah=pdr.get_data_yahoo(ftse_mib40,start='2017-01-01',end='2019-09-27')
ftse_matrix=ftse_yah['Adj Close']
ftse_matrix=ftse_matrix.replace([np.inf, -np.inf], np.nan).dropna(how='all')
def find_cointegrated_pairs(data):
n = data.shape[1]#cioè numero colonne
score_matrix = np.zeros((n, n))
pvalue_matrix = np.ones((n, n))
keys = data.keys()
pairs = []
for i in range(n):
for j in range(i+1, n):
S1 = data[keys[i]]
S2 = data[keys[j]]
result = coint(S1, S2)
score = result[0]
pvalue = result[1]
score_matrix[i, j] = score
pvalue_matrix[i, j] = pvalue
if pvalue < 0.02:
pairs.append((keys[i], keys[j]))
return score_matrix, pvalue_matrix, pairs
tstat,pv,coppie=find_cointegrated_pairs(ftse_matrix)
Finally one (very) stupid question: any idea how to locate all infinite/nan values in the matrix? Thanks and sorry for the long code
Upvotes: 2
Views: 2779
Reputation: 463
To identify the number of missing values in each column of your dataframe you can run the following
pd.isnull(ftse_matrix).sum()
You'll see that you still have missing values in ftse_matrix
To drop them replace this
ftse_matrix=ftse_matrix.replace([np.inf, -np.inf], np.nan).dropna(how='all')
with this
ftse_matrix=ftse_matrix.dropna()
For dropna the argument how='all'
will only drop rows from your dataframe if all of the values in the row are missing
Upvotes: 2