pr22
pr22

Reputation: 189

Python Join List

I was recently reading about VIF function implementation in python and I came across this article.

Link to the article

I am not able to understand the operation which is taking place in this particular line.

features = "+".join(df.columns - ["annual_inc"])

I understand what would be the output when the statement is

features = "+".join(df.columns)

Can anyone explain what is the significance of - ["annual_inc"] in the statement ?

Upvotes: 1

Views: 184

Answers (3)

Shijith
Shijith

Reputation: 4872

For patsy.dmatrices the first arument of the function is formula_like, which has to be a string like y ~ x1 + x2. Here in features , you are creating a string with all columns (join with a + in between) except your target variable which is annual_inc. Next you have to creaate the input string to formula_like , ie target ~ variable1 + variable2 + ..., in your case 'annual_inc ~' + features.

dmatrices('annual_inc ~' + features, df, return_type='dataframe')

refer patsy.dmatrices

Upvotes: 1

Neo
Neo

Reputation: 627

"annual_inc" is the target variable of the regression, and therefore is excluded from the set of features.

Upvotes: 1

jezrael
jezrael

Reputation: 862681

I think there is used old pandas code, now raise error:

df = pd.DataFrame(columns=['a','b','annual_inc'])

print (df.columns - ["annual_inc"])

TypeError: unsupported operand type(s) for -: 'str' and 'str'

So use Index.difference for exclude values of list from columns names:

print(df.columns.difference(["annual_inc"]))
Index(['a', 'b'], dtype='object')

features = "+".join(df.columns.difference(["annual_inc"]))
print(features)
a+b

Upvotes: 2

Related Questions