Reputation: 11
I'm trying to run a multiple linear regression in Python. One of my columns "member_total" is an object and I can't figure out how to convert it into an int. Right now, when I run the OLS model, this variable is interpreted as being categorical and thus I receive tons of coefficients for it.
I suspect the issue is because "member_total" is an object, but I can't figure out how to convert it.
I've tried:
member_total = int(sub.member_total)
and get this error:
TypeError: cannot convert the series to <class 'int'>
I've also tried:
sub = sub.astype(int)
and get this error:
ValueError: invalid literal for int() with base 10: '27,908'
Upvotes: 1
Views: 187
Reputation: 13242
I realized you were just using the replace method incorrectly, when you want to modify a column, you have to identify the DataFrame that it's in as well. Yours appears to be called sub
, with the column in question being member_total
. ~ So, the correct way to use replace would be:
sub['member_total'] = sub['member_total'].replace(',', '')
or
sub['member_total'].replace(',', '', inplace=True)
To make everything you're trying to do one line:
sub['member_total'] = sub['member_total'].replace(',', '').astype(int)
If this STILL fails, a more robust method would be:
sub['member_total'] = sub['member_total'].replace('\D', '', regex=True).astype(int)
Upvotes: 1