Removing NaN values from np array for linear regression

Question

(Beginner programmer please be nice!) I am trying to run a linear regression on csv files for certain years, but some of the files are lacking data for a certain year or years. The linear regression function I'm using through sklearn automatically seems to convert these NaN values to 0, which messes up the results of my regressions for that particular csv file. Here is what I have in my loop currently:

    munilist = ["Adjuntas", "Anasco", "Ciales", "Jayuya", "Lares", "LasMarias", "Maricao", "Mayaguez", "Orocovis", "Penuelas", "Ponce", "SabanaGrande", "SanGerman", "SanSebastian", "Utuado", "Yauco"]
    for municipality in munilist:

        x = np.array([1987, 1992, 1998, 2002, 2007, 2012])
        x = x.reshape(6,1)
        y = np.array(df[df["Municipio"]==municipality].iloc[0, 1:7]).reshape(6,1)
        mask = x[~pd.isna(x)] & y[~pd.isna(y)]
        xlin = np.arange(1987, 2013,1) #range of years to plot
        reg = LinearRegression(fit_intercept=True).fit(x[mask], y[mask])
        a0 = reg.intercept_
        a1 = reg.coef_[0]

I'm not even sure if I did the mask right, but I keep getting this error when I try to use the mask: arrays used as indices must be of integer (or boolean) type

Removing NaN values from np array for linear regression

Answers (1)

Related Questions