Reputation:
I have this data
I am trying to apply this:
one_hot = pd.get_dummies(df)
But I get this error:
Here is my code up until then:
# Import modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import tree
df = pd.read_csv('AllMSAData.csv')
df.head()
corr_matrix = df.corr()
corr_matrix
df.describe()
# Get featurs and targets
labels = np.array(df['CurAV'])
# Remove the labels from the features
# axis 1 refers to the columns
df = df.drop('CurAV', axis = 1)
# Saving feature names for later use
feature_list = list(df.columns)
# Convert to numpy array
df = np.array(df)
Upvotes: 5
Views: 4364
Reputation: 51395
IMO, the documentation should be updated, because it says pd.get_dummies
accepts data that is array-like, and a 2-D numpy
array is array like (despite the fact that there is no formal definition of array-like). However, it seems to not like multi-dimensional arrays.
Take this tiny example:
>>> df
a b c
0 a 1 d
1 b 2 e
2 c 3 f
You can't get dummies on the underlying 2D numpy
array:
>>> pd.get_dummies(df.values)
Exception: Data must be 1-dimensional
But you can get dummies on the dataframe itself:
>>> pd.get_dummies(df)
b a_a a_b a_c c_d c_e c_f
0 1 1 0 0 1 0 0
1 2 0 1 0 0 1 0
2 3 0 0 1 0 0 1
Or on the 1D array underlying an individual column:
>>> pd.get_dummies(df['a'].values)
a b c
0 1 0 0
1 0 1 0
2 0 0 1
Upvotes: 3