Reputation: 645
Given a series that looks like:
0 foo
1 bar
2 foo
3 foo
4 bar
5 baz
How can I create a dataframe where each column is a mask for a unique value in the series? In this example, it would look like:
foo bar baz
0 True False False
1 False True False
2 True False False
3 True False False
4 False True False
5 False False True
Upvotes: 3
Views: 170
Reputation: 402263
Let's try pd.factorize
+ np.eye
for a fast, concise solution.
x,y = pd.factorize(s)
pd.DataFrame(np.eye(len(y), dtype=bool)[x], columns=y)
foo bar baz
0 True False False
1 False True False
2 True False False
3 True False False
4 False True False
5 False False True
Upvotes: 2
Reputation: 221504
Here's one with array-initialization
-
def series_hotencode(s):
a,b = s.factorize()
ar = np.zeros((len(a),len(b)), dtype=bool)
ar[np.arange(len(a)),a] = 1
return pd.DataFrame(ar,columns=b)
Sample run -
In [40]: s
Out[40]:
0 foo
1 bar
2 foo
3 foo
4 bar
5 baz
Name: 1, dtype: object
In [41]: series_hotencode(s)
Out[41]:
foo bar baz
0 True False False
1 False True False
2 True False False
3 True False False
4 False True False
5 False False True
Upvotes: 2
Reputation: 323226
Using get_dummies
s.str.get_dummies().astype(bool)
Out[392]:
bar baz foo
0 False False True
1 True False False
2 False False True
3 False False True
4 True False False
5 False True False
Or we try something new crosstab
pd.crosstab(s.index,s).astype(bool)
Out[395]:
a bar baz foo
row_0
0 False False True
1 True False False
2 False False True
3 False False True
4 True False False
5 False True False
Upvotes: 4