Reputation: 1667
My dataframe appears to be non-numeric after some transformations (see previous post on dropping duplicates: drop duplicates pandas dataframe)
When I use it in a statsmodels regression I get this error:
ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).
Can I convert the entire dataframe back to numeric somehow?
Using the dataframe with sklearn works for some reason
I am actually not sure what the data type is, only after opening the dataframe in spyder I noticed that it is not colered anymore. When I used type(df) it just tells me that it is a dataframe.
This is an example from the post I mentioned where the transformation occurs (compare the df before and after the last line):
dict1 = [{'var0': 0, 'var1': 0, 'var2': 2},
{'var0': 0, 'var1': 0, 'var2': 4},
{'var0': 0, 'var1': 0, 'var2': 8},
{'var0':0, 'var1': 0, 'var2': 12},]
df = pd.DataFrame(dict1, index=['s1', 's2','s1','s2'])
df.reset_index().T.drop_duplicates().T.set_index('index')
This is the dataframe before running the last line:
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, s1 to s2
Data columns (total 3 columns):
var0 4 non-null int64
var1 4 non-null int64
var2 4 non-null int64
dtypes: int64(3)
And this is after:
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, s1 to s2
Data columns (total 2 columns):
var0 4 non-null object
var2 4 non-null object
dtypes: object(2)
memory usage: 96.0+ bytes
After the transformation:
print(df)
var0 var2
index
s1 0 2
s2 0 4
s1 0 8
s2 0 12
Upvotes: 2
Views: 6283
Reputation: 28253
One issue with the original answer in this post is that the transformation converts the integers to objects. This happens after the transpose since now the same column stores integers as well as the index which is textual.
Instead, you can sidestep the issue like this:
out = df.reset_index(drop=True).T.drop_duplicates().T.set_index(df.index)
out
var0 var2
s1 0 2
s2 0 4
s1 0 8
s2 0 12
Or, if your actual example is sufficiently different that you can't use the above, there is always casting, i.e.
out.astype(int)
Upvotes: 3