Polar
Polar

Reputation: 147

Pandas: Missing values from Yahoo Finance

Despite my search through this forum and other sources I still have no idea how to fix this problem.

Explain: I was running a script(below) for downloading prices for all stocks in FTSE MIB 40 in order to try the best cointegrated pairs. Unfortunately it seems that, when running the script (already working for other markets) it reports error for nans or infinite values. I tried to use dropna but it still has problem. Here my whole code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels
from statsmodels.tsa.stattools import coint
from statsmodels.tsa.stattools import adfuller
import yfinance as yf
import pandas_datareader as pdr
import seaborn

ftse_mib40=['A2A.MI','AMP.MI','ATL.MI','AZM.MI','BAMI.MI','BPE.MI','BZU.MI','CPR.MI','CNHI.MI','DIA.MI','ENEL.MI','ENI.MI','EXO.MI','RACE.MI','FCA.MI','FBK.MI','G.MI','HER.MI','ISP.MI','IG.MI','JUVE.MI','LDO.MI','MB.MI','MONC.MI','NEXI.MI','NXEN','PIRC.MI','PST.MI','PRY.MI','REC.MI','SPM.MI','SFER.MI','SRG.MI','STM.MI','TIT.MI','TEN.MI','TRN.MI','UBI.MI','UCG.MI','UNI.MI','US.MI']
ftse_yah=pdr.get_data_yahoo(ftse_mib40,start='2017-01-01',end='2019-09-27')
ftse_matrix=ftse_yah['Adj Close']
ftse_matrix=ftse_matrix.replace([np.inf, -np.inf], np.nan).dropna(how='all')

def find_cointegrated_pairs(data):
    n = data.shape[1]#cioè numero colonne
    score_matrix = np.zeros((n, n))
    pvalue_matrix = np.ones((n, n))
    keys = data.keys()
    pairs = []
    for i in range(n):
        for j in range(i+1, n):
            S1 = data[keys[i]]
            S2 = data[keys[j]]
            result = coint(S1, S2)
            score = result[0]
            pvalue = result[1]
            score_matrix[i, j] = score
            pvalue_matrix[i, j] = pvalue
            if pvalue < 0.02:
                pairs.append((keys[i], keys[j]))
    return score_matrix, pvalue_matrix, pairs
tstat,pv,coppie=find_cointegrated_pairs(ftse_matrix)

Finally one (very) stupid question: any idea how to locate all infinite/nan values in the matrix? Thanks and sorry for the long code

Upvotes: 2

Views: 2779

Answers (1)

Kyle Safran
Kyle Safran

Reputation: 463

To identify the number of missing values in each column of your dataframe you can run the following

pd.isnull(ftse_matrix).sum()

You'll see that you still have missing values in ftse_matrix

To drop them replace this

ftse_matrix=ftse_matrix.replace([np.inf, -np.inf], np.nan).dropna(how='all')

with this

ftse_matrix=ftse_matrix.dropna()

For dropna the argument how='all' will only drop rows from your dataframe if all of the values in the row are missing

Upvotes: 2

Related Questions