learner
learner

Reputation: 2752

How to get one hot encoded vector as in the table below

I am trying to get my table in the following form. For some reason, i could not get my pivot code working.

df = pd.DataFrame([('a','f1'), ('a','f2'),('a','f3') ,('b','f4'),('c','f2'), ('c','f4')], columns = ['user', 'val'])


df 
---
user    val
a      f1
a      f2
a      f3
b      f4
c      f2
c      f4 


>> output 

user    f1  f2  f3  f4
a       1   1   1   0
b       0   0   0   1
c       1   0   1   0

Upvotes: 4

Views: 102

Answers (3)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210932

Yet another solution.

In [82]: from sklearn.feature_extraction.text import CountVectorizer

In [83]: cv = CountVectorizer()

In [84]: d2 = df.groupby('user')['val'].agg(' '.join).reset_index(name='val')

In [85]: d2
Out[85]:
  user       val
0    a  f1 f2 f3
1    b        f4
2    c     f2 f4

In [86]: r = pd.SparseDataFrame(cv.fit_transform(d2['val']),
    ...:                                 d2.index,
    ...:                                 cv.get_feature_names(),
    ...:                                 default_fill_value=0)
    ...:

In [88]: d2[['user']].join(r)
Out[88]:
  user  f1  f2  f3  f4
0    a   1   1   1   0
1    b   0   0   0   1
2    c   0   1   0   1

Upvotes: 2

cs95
cs95

Reputation: 402922

Option 1
get_dummies with groupby + sum

df.set_index('user').val.str.get_dummies().sum(level=0)

      f1  f2  f3  f4
user                
a      1   1   1   0
b      0   0   0   1
c      0   1   0   1

Option 2
groupby + value_counts + unstack

df.groupby('user').val.value_counts().unstack(fill_value=0)

val   f1  f2  f3  f4
user                
a      1   1   1   0
b      0   0   0   1
c      0   1   0   1

Option 3
pivot_table with size as the aggfunc.

df.pivot_table(index='user', columns='val', aggfunc='size', fill_value=0)

val   f1  f2  f3  f4
user                
a      1   1   1   0
b      0   0   0   1
c      0   1   0   1

Upvotes: 5

learner
learner

Reputation: 2752

Seems like pd.crosstab(df['user'], df['val']) work too.

Upvotes: 3

Related Questions