Reputation: 4815
I am working my way through Wes's Python For Data Analysis, and I've run into a strange problem that is not addressed in the book.
In the code below, based on page 199 of his book, I create a dataframe and then use pd.cut()
to create cat_obj
. According to the book, cat_obj
is
"a special Categorical object. You can treat it like an array of strings indicating the bin name; internally it contains a levels array indicating the distinct category names along with a labeling for the ages data in the labels attribute"
Awesome! However, if I use the exact same pd.cut()
code (In [5] below) to create a new column of the dataframe (called df['cat']
), that column is not treated as a special categorical variable but simply as a regular pandas series.
How, then, do I create a column in a dataframe that is treated as a categorical variable?
In [4]:
import pandas as pd
raw_data = {'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
'score': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['name', 'score'])
bins = [0, 25, 50, 75, 100]
group_names = ['Low', 'Okay', 'Good', 'Great']
In [5]:
cat_obj = pd.cut(df['score'], bins, labels=group_names)
df['cat'] = pd.cut(df['score'], bins, labels=group_names)
In [7]:
type(cat_obj)
Out[7]:
pandas.core.categorical.Categorical
In [8]:
type(df['cat'])
Out[8]:
pandas.core.series.Series
Upvotes: 16
Views: 8138
Reputation: 803
From http://pandas-docs.github.io/pandas-docs-travis/categorical.html, from pandas 0.15 onwards
Specify dtype="category" when constructing a Series:
In [1]: s = pd.Series(["a","b","c","a"], dtype="category")
In [2]: s
Out[2]:
0 a
1 b
2 c
3 a
dtype: category
Categories (3, object): [a, b, c]
You can then add this to an existing series.
Or convert an existing Series or column to a category dtype:
In [3]: df = pd.DataFrame({"A":["a","b","c","a"]})
In [4]: df["B"] = df["A"].astype('category')
In [5]: df
Out[5]:
A B
0 a a
1 b b
2 c c
3 a a
Upvotes: 0
Reputation: 4800
It might be happening because of this kind of behaviour by setter-:
Sample getter and setter-
class a:
x = 1
@property
def p(self):
return int(self.x)
@p.setter
def p(self,v):
self.x = v
t = 1.32
a().p = 1.32
print type(t) --> <type 'float'>
print type(a().p) --> <type 'int'>
For now df
only accepts Series data
and its setter converts Categorial data
into Series
. df
categorial support is due in Next Pandas release.
Upvotes: 1
Reputation: 1
Right now, you can't have categorical data in a Series or DataFrame object, but this functionality will be implemented in Pandas 0.15 (due in September).
Upvotes: 0