JC91
JC91

Reputation: 11

Issue converting object to int in Python for OLS regression

I'm trying to run a multiple linear regression in Python. One of my columns "member_total" is an object and I can't figure out how to convert it into an int. Right now, when I run the OLS model, this variable is interpreted as being categorical and thus I receive tons of coefficients for it.

[example] enter image description here

I suspect the issue is because "member_total" is an object, but I can't figure out how to convert it.

I've tried:

member_total = int(sub.member_total)

and get this error:

TypeError: cannot convert the series to <class 'int'>

I've also tried:

sub = sub.astype(int)

and get this error:

ValueError: invalid literal for int() with base 10: '27,908'

df.member_total

dropping comma

Upvotes: 1

Views: 187

Answers (1)

BeRT2me
BeRT2me

Reputation: 13242

I realized you were just using the replace method incorrectly, when you want to modify a column, you have to identify the DataFrame that it's in as well. Yours appears to be called sub, with the column in question being member_total. ~ So, the correct way to use replace would be:

sub['member_total'] = sub['member_total'].replace(',', '')

or

sub['member_total'].replace(',', '', inplace=True)

To make everything you're trying to do one line:

sub['member_total'] = sub['member_total'].replace(',', '').astype(int)

If this STILL fails, a more robust method would be:

sub['member_total'] = sub['member_total'].replace('\D', '', regex=True).astype(int)

Upvotes: 1

Related Questions