Vowpal Wabbit Python Sklearn - Predict vw Format

Question

I was working with the python wrapper (SKLEARN) for VW but couldn't figure out how to use namespaces so I decided to bypass the tovw() and create my own formatted list.

First I exported out a text file for the training and test files, ran with vw through the terminal and all worked well. I then tried to run the same with the python wrapper. Training the model appeared to work but am getting an error when I try to predict. Is it not possible to predict with a file already formatted as vw?

This is the code used to export the data and also to create the lists

vwX=X_train["Response"].astype('str')+' |'\
'a '+\
'Product_Info_4:'+X_train["Product_Info_4"].astype('str')+' '\
'Ins_Age:'+X_train["Ins_Age"].astype('str')+' '\
'Ht:'+X_train["Ht"].astype('str')+' '\
'Wt:'+X_train["Wt"].astype('str')+' '\
'BMI:'+X_train["BMI"].astype('str')+' '\
'Employment_Info_1:'+X_train["Employment_Info_1"].astype('str')+' '\
'Employment_Info_4:'+X_train["Employment_Info_4"].astype('str')+' '\
'Employment_Info_6:'+X_train["Employment_Info_6"].astype('str')+' '\
'Insurance_History_5:'+X_train["Insurance_History_5"].astype('str')+' '\
'Family_Hist_2:'+X_train["Family_Hist_2"].astype('str')+' '\
'Family_Hist_3:'+X_train["Family_Hist_3"].astype('str')+' '\
'Family_Hist_4:'+X_train["Family_Hist_4"].astype('str')+' '\
'Family_Hist_5:'+X_train["Family_Hist_5"].astype('str')+' '\
'Medical_History_1:'+X_train["Medical_History_1"].astype('str')+' '\
'Medical_History_15:'+X_train["Medical_History_15"].astype('str')+' '\
'Medical_History_24:'+X_train["Medical_History_24"].astype('str')+' '\
'Medical_History_32:'+X_train["Medical_History_32"].astype('str')+' '\
'|b '+\
np.where(X_train["Medical_Keyword_1"] ==0,''," Medical_Keyword_1")+\
np.where(X_train["Medical_Keyword_2"] ==0,''," Medical_Keyword_2")+\
np.where(X_train["Medical_Keyword_3"] ==0,''," Medical_Keyword_3")+\
np.where(X_train["Medical_Keyword_4"] ==0,''," Medical_Keyword_4")+\
np.where(X_train["Medical_Keyword_5"] ==0,''," Medical_Keyword_5")+\
np.where(X_train["Medical_Keyword_6"] ==0,''," Medical_Keyword_6")+\
np.where(X_train["Medical_Keyword_7"] ==0,''," Medical_Keyword_7")+\
np.where(X_train["Medical_Keyword_8"] ==0,''," Medical_Keyword_8")+\
np.where(X_train["Medical_Keyword_9"] ==0,''," Medical_Keyword_9")+\
np.where(X_train["Medical_Keyword_10"] ==0,''," Medical_Keyword_10")+\
np.where(X_train["Medical_Keyword_11"] ==0,''," Medical_Keyword_11")+\
np.where(X_train["Medical_Keyword_12"] ==0,''," Medical_Keyword_12")+\
np.where(X_train["Medical_Keyword_13"] ==0,''," Medical_Keyword_13")+\
np.where(X_train["Medical_Keyword_14"] ==0,''," Medical_Keyword_14")+\
np.where(X_train["Medical_Keyword_15"] ==0,''," Medical_Keyword_15")+\
np.where(X_train["Medical_Keyword_16"] ==0,''," Medical_Keyword_16")+\
np.where(X_train["Medical_Keyword_17"] ==0,''," Medical_Keyword_17")+\
np.where(X_train["Medical_Keyword_18"] ==0,''," Medical_Keyword_18")+\
np.where(X_train["Medical_Keyword_19"] ==0,''," Medical_Keyword_19")+\
np.where(X_train["Medical_Keyword_20"] ==0,''," Medical_Keyword_20")+\
np.where(X_train["Medical_Keyword_21"] ==0,''," Medical_Keyword_21")+\
np.where(X_train["Medical_Keyword_22"] ==0,''," Medical_Keyword_22")+\
np.where(X_train["Medical_Keyword_23"] ==0,''," Medical_Keyword_23")+\
np.where(X_train["Medical_Keyword_24"] ==0,''," Medical_Keyword_24")+\
np.where(X_train["Medical_Keyword_25"] ==0,''," Medical_Keyword_25")+\
np.where(X_train["Medical_Keyword_26"] ==0,''," Medical_Keyword_26")+\
np.where(X_train["Medical_Keyword_27"] ==0,''," Medical_Keyword_27")+\
np.where(X_train["Medical_Keyword_28"] ==0,''," Medical_Keyword_28")+\
np.where(X_train["Medical_Keyword_29"] ==0,''," Medical_Keyword_29")+\
np.where(X_train["Medical_Keyword_30"] ==0,''," Medical_Keyword_30")+\
np.where(X_train["Medical_Keyword_31"] ==0,''," Medical_Keyword_31")+\
np.where(X_train["Medical_Keyword_32"] ==0,''," Medical_Keyword_32")+\
np.where(X_train["Medical_Keyword_33"] ==0,''," Medical_Keyword_33")+\
np.where(X_train["Medical_Keyword_34"] ==0,''," Medical_Keyword_34")+\
np.where(X_train["Medical_Keyword_35"] ==0,''," Medical_Keyword_35")+\
np.where(X_train["Medical_Keyword_36"] ==0,''," Medical_Keyword_36")+\
np.where(X_train["Medical_Keyword_37"] ==0,''," Medical_Keyword_37")+\
np.where(X_train["Medical_Keyword_38"] ==0,''," Medical_Keyword_38")+\
np.where(X_train["Medical_Keyword_39"] ==0,''," Medical_Keyword_39")+\
np.where(X_train["Medical_Keyword_40"] ==0,''," Medical_Keyword_40")+\
np.where(X_train["Medical_Keyword_41"] ==0,''," Medical_Keyword_41")+\
np.where(X_train["Medical_Keyword_42"] ==0,''," Medical_Keyword_42")+\
np.where(X_train["Medical_Keyword_43"] ==0,''," Medical_Keyword_43")+\
np.where(X_train["Medical_Keyword_44"] ==0,''," Medical_Keyword_44")+\
np.where(X_train["Medical_Keyword_45"] ==0,''," Medical_Keyword_45")+\
np.where(X_train["Medical_Keyword_46"] ==0,''," Medical_Keyword_46")+\
np.where(X_train["Medical_Keyword_47"] ==0,''," Medical_Keyword_47")+\
np.where(X_train["Medical_Keyword_48"] ==0,''," Medical_Keyword_48")+\
' |c '+\
"Product_Info_1_"+X_train["Product_Info_1"].astype('str')+' '\
"Product_Info_2_"+X_train["Product_Info_2"].astype('str')+' '\
"Product_Info_3_"+X_train["Product_Info_3"].astype('str')+' '\
"Product_Info_5_"+X_train["Product_Info_5"].astype('str')+' '\
"Product_Info_6_"+X_train["Product_Info_6"].astype('str')+' '\
"Product_Info_7_"+X_train["Product_Info_7"].astype('str')+' '\
"Employment_Info_2_"+X_train["Employment_Info_2"].astype('str')+' '\
"Employment_Info_3_"+X_train["Employment_Info_3"].astype('str')+' '\
"Employment_Info_5_"+X_train["Employment_Info_5"].astype('str')+' '\
"InsuredInfo_1_"+X_train["InsuredInfo_1"].astype('str')+' '\
"InsuredInfo_2_"+X_train["InsuredInfo_2"].astype('str')+' '\
"InsuredInfo_3_"+X_train["InsuredInfo_3"].astype('str')+' '\
"InsuredInfo_4_"+X_train["InsuredInfo_4"].astype('str')+' '\
"InsuredInfo_5_"+X_train["InsuredInfo_5"].astype('str')+' '\
"InsuredInfo_6_"+X_train["InsuredInfo_6"].astype('str')+' '\
"InsuredInfo_7_"+X_train["InsuredInfo_7"].astype('str')+' '\
"Insurance_History_1_"+X_train["Insurance_History_1"].astype('str')+' '\
"Insurance_History_2_"+X_train["Insurance_History_2"].astype('str')+' '\
"Insurance_History_3_"+X_train["Insurance_History_3"].astype('str')+' '\
"Insurance_History_4_"+X_train["Insurance_History_4"].astype('str')+' '\
"Insurance_History_7_"+X_train["Insurance_History_7"].astype('str')+' '\
"Insurance_History_8_"+X_train["Insurance_History_8"].astype('str')+' '\
"Insurance_History_9_"+X_train["Insurance_History_9"].astype('str')+' '\
"Family_Hist_1_"+X_train["Family_Hist_1"].astype('str')+' '\
"Medical_History_2_"+X_train["Medical_History_2"].astype('str')+' '\
"Medical_History_3_"+X_train["Medical_History_3"].astype('str')+' '\
"Medical_History_4_"+X_train["Medical_History_4"].astype('str')+' '\
"Medical_History_5_"+X_train["Medical_History_5"].astype('str')+' '\
"Medical_History_6_"+X_train["Medical_History_6"].astype('str')+' '\
"Medical_History_7_"+X_train["Medical_History_7"].astype('str')+' '\
"Medical_History_8_"+X_train["Medical_History_8"].astype('str')+' '\
"Medical_History_9_"+X_train["Medical_History_9"].astype('str')+' '\
"Medical_History_10_"+X_train["Medical_History_10"].astype('str')+' '\
"Medical_History_11_"+X_train["Medical_History_11"].astype('str')+' '\
"Medical_History_12_"+X_train["Medical_History_12"].astype('str')+' '\
"Medical_History_13_"+X_train["Medical_History_13"].astype('str')+' '\
"Medical_History_14_"+X_train["Medical_History_14"].astype('str')+' '\
"Medical_History_16_"+X_train["Medical_History_16"].astype('str')+' '\
"Medical_History_17_"+X_train["Medical_History_17"].astype('str')+' '\
"Medical_History_18_"+X_train["Medical_History_18"].astype('str')+' '\
"Medical_History_19_"+X_train["Medical_History_19"].astype('str')+' '\
"Medical_History_20_"+X_train["Medical_History_20"].astype('str')+' '\
"Medical_History_21_"+X_train["Medical_History_21"].astype('str')+' '\
"Medical_History_22_"+X_train["Medical_History_22"].astype('str')+' '\
"Medical_History_23_"+X_train["Medical_History_23"].astype('str')+' '\
"Medical_History_25_"+X_train["Medical_History_25"].astype('str')+' '\
"Medical_History_26_"+X_train["Medical_History_26"].astype('str')+' '\
"Medical_History_27_"+X_train["Medical_History_27"].astype('str')+' '\
"Medical_History_28_"+X_train["Medical_History_28"].astype('str')+' '\
"Medical_History_29_"+X_train["Medical_History_29"].astype('str')+' '\
"Medical_History_30_"+X_train["Medical_History_30"].astype('str')+' '\
"Medical_History_31_"+X_train["Medical_History_31"].astype('str')+' '\
"Medical_History_33_"+X_train["Medical_History_33"].astype('str')+' '\
"Medical_History_34_"+X_train["Medical_History_34"].astype('str')+' '\
"Medical_History_35_"+X_train["Medical_History_35"].astype('str')+' '\
"Medical_History_36_"+X_train["Medical_History_36"].astype('str')+' '\
"Medical_History_37_"+X_train["Medical_History_37"].astype('str')+' '\
"Medical_History_38_"+X_train["Medical_History_38"].astype('str')+' '\
"Medical_History_39_"+X_train["Medical_History_39"].astype('str')+' '\
"Medical_History_40_"+X_train["Medical_History_40"].astype('str')+' '\
"Medical_History_41_"+X_train["Medical_History_41"].astype('str')

vwX.to_csv('train.vw',mode='a', header=False,index=False)



vwX_T='1'+ ' |'\
'a '+\
'Product_Info_4:'+X_test["Product_Info_4"].astype('str')+' '\
'Ins_Age:'+X_test["Ins_Age"].astype('str')+' '\
'Ht:'+X_test["Ht"].astype('str')+' '\
'Wt:'+X_test["Wt"].astype('str')+' '\
'BMI:'+X_test["BMI"].astype('str')+' '\
'Employment_Info_1:'+X_test["Employment_Info_1"].astype('str')+' '\
'Employment_Info_4:'+X_test["Employment_Info_4"].astype('str')+' '\
'Employment_Info_6:'+X_test["Employment_Info_6"].astype('str')+' '\
'Insurance_History_5:'+X_test["Insurance_History_5"].astype('str')+' '\
'Family_Hist_2:'+X_test["Family_Hist_2"].astype('str')+' '\
'Family_Hist_3:'+X_test["Family_Hist_3"].astype('str')+' '\
'Family_Hist_4:'+X_test["Family_Hist_4"].astype('str')+' '\
'Family_Hist_5:'+X_test["Family_Hist_5"].astype('str')+' '\
'Medical_History_1:'+X_test["Medical_History_1"].astype('str')+' '\
'Medical_History_15:'+X_test["Medical_History_15"].astype('str')+' '\
'Medical_History_24:'+X_test["Medical_History_24"].astype('str')+' '\
'Medical_History_32:'+X_test["Medical_History_32"].astype('str')+' '\
'|b '+\
np.where(X_test["Medical_Keyword_1"] ==0,''," Medical_Keyword_1")+\
np.where(X_test["Medical_Keyword_2"] ==0,''," Medical_Keyword_2")+\
np.where(X_test["Medical_Keyword_3"] ==0,''," Medical_Keyword_3")+\
np.where(X_test["Medical_Keyword_4"] ==0,''," Medical_Keyword_4")+\
np.where(X_test["Medical_Keyword_5"] ==0,''," Medical_Keyword_5")+\
np.where(X_test["Medical_Keyword_6"] ==0,''," Medical_Keyword_6")+\
np.where(X_test["Medical_Keyword_7"] ==0,''," Medical_Keyword_7")+\
np.where(X_test["Medical_Keyword_8"] ==0,''," Medical_Keyword_8")+\
np.where(X_test["Medical_Keyword_9"] ==0,''," Medical_Keyword_9")+\
np.where(X_test["Medical_Keyword_10"] ==0,''," Medical_Keyword_10")+\
np.where(X_test["Medical_Keyword_11"] ==0,''," Medical_Keyword_11")+\
np.where(X_test["Medical_Keyword_12"] ==0,''," Medical_Keyword_12")+\
np.where(X_test["Medical_Keyword_13"] ==0,''," Medical_Keyword_13")+\
np.where(X_test["Medical_Keyword_14"] ==0,''," Medical_Keyword_14")+\
np.where(X_test["Medical_Keyword_15"] ==0,''," Medical_Keyword_15")+\
np.where(X_test["Medical_Keyword_16"] ==0,''," Medical_Keyword_16")+\
np.where(X_test["Medical_Keyword_17"] ==0,''," Medical_Keyword_17")+\
np.where(X_test["Medical_Keyword_18"] ==0,''," Medical_Keyword_18")+\
np.where(X_test["Medical_Keyword_19"] ==0,''," Medical_Keyword_19")+\
np.where(X_test["Medical_Keyword_20"] ==0,''," Medical_Keyword_20")+\
np.where(X_test["Medical_Keyword_21"] ==0,''," Medical_Keyword_21")+\
np.where(X_test["Medical_Keyword_22"] ==0,''," Medical_Keyword_22")+\
np.where(X_test["Medical_Keyword_23"] ==0,''," Medical_Keyword_23")+\
np.where(X_test["Medical_Keyword_24"] ==0,''," Medical_Keyword_24")+\
np.where(X_test["Medical_Keyword_25"] ==0,''," Medical_Keyword_25")+\
np.where(X_test["Medical_Keyword_26"] ==0,''," Medical_Keyword_26")+\
np.where(X_test["Medical_Keyword_27"] ==0,''," Medical_Keyword_27")+\
np.where(X_test["Medical_Keyword_28"] ==0,''," Medical_Keyword_28")+\
np.where(X_test["Medical_Keyword_29"] ==0,''," Medical_Keyword_29")+\
np.where(X_test["Medical_Keyword_30"] ==0,''," Medical_Keyword_30")+\
np.where(X_test["Medical_Keyword_31"] ==0,''," Medical_Keyword_31")+\
np.where(X_test["Medical_Keyword_32"] ==0,''," Medical_Keyword_32")+\
np.where(X_test["Medical_Keyword_33"] ==0,''," Medical_Keyword_33")+\
np.where(X_test["Medical_Keyword_34"] ==0,''," Medical_Keyword_34")+\
np.where(X_test["Medical_Keyword_35"] ==0,''," Medical_Keyword_35")+\
np.where(X_test["Medical_Keyword_36"] ==0,''," Medical_Keyword_36")+\
np.where(X_test["Medical_Keyword_37"] ==0,''," Medical_Keyword_37")+\
np.where(X_test["Medical_Keyword_38"] ==0,''," Medical_Keyword_38")+\
np.where(X_test["Medical_Keyword_39"] ==0,''," Medical_Keyword_39")+\
np.where(X_test["Medical_Keyword_40"] ==0,''," Medical_Keyword_40")+\
np.where(X_test["Medical_Keyword_41"] ==0,''," Medical_Keyword_41")+\
np.where(X_test["Medical_Keyword_42"] ==0,''," Medical_Keyword_42")+\
np.where(X_test["Medical_Keyword_43"] ==0,''," Medical_Keyword_43")+\
np.where(X_test["Medical_Keyword_44"] ==0,''," Medical_Keyword_44")+\
np.where(X_test["Medical_Keyword_45"] ==0,''," Medical_Keyword_45")+\
np.where(X_test["Medical_Keyword_46"] ==0,''," Medical_Keyword_46")+\
np.where(X_test["Medical_Keyword_47"] ==0,''," Medical_Keyword_47")+\
np.where(X_test["Medical_Keyword_48"] ==0,''," Medical_Keyword_48")+\
' |c '+\
"Product_Info_1_"+X_test["Product_Info_1"].astype('str')+' '\
"Product_Info_2_"+X_test["Product_Info_2"].astype('str')+' '\
"Product_Info_3_"+X_test["Product_Info_3"].astype('str')+' '\
"Product_Info_5_"+X_test["Product_Info_5"].astype('str')+' '\
"Product_Info_6_"+X_test["Product_Info_6"].astype('str')+' '\
"Product_Info_7_"+X_test["Product_Info_7"].astype('str')+' '\
"Employment_Info_2_"+X_test["Employment_Info_2"].astype('str')+' '\
"Employment_Info_3_"+X_test["Employment_Info_3"].astype('str')+' '\
"Employment_Info_5_"+X_test["Employment_Info_5"].astype('str')+' '\
"InsuredInfo_1_"+X_test["InsuredInfo_1"].astype('str')+' '\
"InsuredInfo_2_"+X_test["InsuredInfo_2"].astype('str')+' '\
"InsuredInfo_3_"+X_test["InsuredInfo_3"].astype('str')+' '\
"InsuredInfo_4_"+X_test["InsuredInfo_4"].astype('str')+' '\
"InsuredInfo_5_"+X_test["InsuredInfo_5"].astype('str')+' '\
"InsuredInfo_6_"+X_test["InsuredInfo_6"].astype('str')+' '\
"InsuredInfo_7_"+X_test["InsuredInfo_7"].astype('str')+' '\
"Insurance_History_1_"+X_test["Insurance_History_1"].astype('str')+' '\
"Insurance_History_2_"+X_test["Insurance_History_2"].astype('str')+' '\
"Insurance_History_3_"+X_test["Insurance_History_3"].astype('str')+' '\
"Insurance_History_4_"+X_test["Insurance_History_4"].astype('str')+' '\
"Insurance_History_7_"+X_test["Insurance_History_7"].astype('str')+' '\
"Insurance_History_8_"+X_test["Insurance_History_8"].astype('str')+' '\
"Insurance_History_9_"+X_test["Insurance_History_9"].astype('str')+' '\
"Family_Hist_1_"+X_test["Family_Hist_1"].astype('str')+' '\
"Medical_History_2_"+X_test["Medical_History_2"].astype('str')+' '\
"Medical_History_3_"+X_test["Medical_History_3"].astype('str')+' '\
"Medical_History_4_"+X_test["Medical_History_4"].astype('str')+' '\
"Medical_History_5_"+X_test["Medical_History_5"].astype('str')+' '\
"Medical_History_6_"+X_test["Medical_History_6"].astype('str')+' '\
"Medical_History_7_"+X_test["Medical_History_7"].astype('str')+' '\
"Medical_History_8_"+X_test["Medical_History_8"].astype('str')+' '\
"Medical_History_9_"+X_test["Medical_History_9"].astype('str')+' '\
"Medical_History_10_"+X_test["Medical_History_10"].astype('str')+' '\
"Medical_History_11_"+X_test["Medical_History_11"].astype('str')+' '\
"Medical_History_12_"+X_test["Medical_History_12"].astype('str')+' '\
"Medical_History_13_"+X_test["Medical_History_13"].astype('str')+' '\
"Medical_History_14_"+X_test["Medical_History_14"].astype('str')+' '\
"Medical_History_16_"+X_test["Medical_History_16"].astype('str')+' '\
"Medical_History_17_"+X_test["Medical_History_17"].astype('str')+' '\
"Medical_History_18_"+X_test["Medical_History_18"].astype('str')+' '\
"Medical_History_19_"+X_test["Medical_History_19"].astype('str')+' '\
"Medical_History_20_"+X_test["Medical_History_20"].astype('str')+' '\
"Medical_History_21_"+X_test["Medical_History_21"].astype('str')+' '\
"Medical_History_22_"+X_test["Medical_History_22"].astype('str')+' '\
"Medical_History_23_"+X_test["Medical_History_23"].astype('str')+' '\
"Medical_History_25_"+X_test["Medical_History_25"].astype('str')+' '\
"Medical_History_26_"+X_test["Medical_History_26"].astype('str')+' '\
"Medical_History_27_"+X_test["Medical_History_27"].astype('str')+' '\
"Medical_History_28_"+X_test["Medical_History_28"].astype('str')+' '\
"Medical_History_29_"+X_test["Medical_History_29"].astype('str')+' '\
"Medical_History_30_"+X_test["Medical_History_30"].astype('str')+' '\
"Medical_History_31_"+X_test["Medical_History_31"].astype('str')+' '\
"Medical_History_33_"+X_test["Medical_History_33"].astype('str')+' '\
"Medical_History_34_"+X_test["Medical_History_34"].astype('str')+' '\
"Medical_History_35_"+X_test["Medical_History_35"].astype('str')+' '\
"Medical_History_36_"+X_test["Medical_History_36"].astype('str')+' '\
"Medical_History_37_"+X_test["Medical_History_37"].astype('str')+' '\
"Medical_History_38_"+X_test["Medical_History_38"].astype('str')+' '\
"Medical_History_39_"+X_test["Medical_History_39"].astype('str')+' '\
"Medical_History_40_"+X_test["Medical_History_40"].astype('str')+' '\
"Medical_History_41_"+X_test["Medical_History_41"].astype('str')

vwX_T.to_csv('test.vw',mode='a', header=False,index=False)

#create a list to be used below in pyvw
vwX_lst=[]
vwX_T_lst=[]

for j in vwX.values:
    vwX_lst.append(j)

for j in vwX_T.values:
    vwX_T_lst.append(j)

Then I trained a model, which seemed to run OK:

import sys
sys.path.append('/home/anaconda/lib/python2.7/site-packages/vowpal_wabbit/python')

import pyvw
import sklearn_vw as slvw

import numpy as np
import pandas as pd

from sklearn.cross_validation import train_test_split,KFold
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler

import ml_metrics

mod=slvw.VWRegressor(passes=5, quadratic="aa ab")
mod.fit(X=vwX_lst,convert_to_vw=False)

preds=mod.predict(X=vwX_T_lst,convert_to_vw=False)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
 in ()
     16 mod=slvw.VWRegressor(passes=5, quadratic="aa ab")
     17 mod.fit(X=vwX_lst,convert_to_vw=False)
---> 18 preds=mod.predict(X=vwX_T_lst,convert_to_vw=False)
     19 

/home/anaconda/lib/python2.7/site-packages/vowpal_wabbit/python/sklearn_vw.pyc in predict(self, X, convert_to_vw)
    255             ex.set_test_only(True)
    256             ex.learn()
--> 257             y[idx] = ex.get_simplelabel_prediction()
    258             ex.finish()
    259 

IndexError: index 1 is out of bounds for axis 0 with size 1

Turbo · Accepted Answer

I just ran into this myself. The problem is about 10 lines of code above where you are getting the error. It states:

try:
    num_samples = X.shape[0] if X.ndim > 1 else 1
except AttributeError:
    num_samples = 1

num_samples is then used to initialize an empty numpy array of that size:

y = np.empty([num_samples])

So if X doesn't have the attribute ndim or if X.ndim == 1, then sum_samples is set to 1, and your np array is initialized with a size of 1.

So when the second prediction score gets put in y you get your index out of bounds error here:

y[idx] = ex.get_simplelabel_prediction()

I fixed this by changing the try/except code to use the length of X:

try:
    num_samples = X.shape[0] if X.ndim > 1 else len(X)
except AttributeError:
    num_samples = len(X)

Vowpal Wabbit Python Sklearn - Predict vw Format

Answers (1)

Related Questions