Reputation: 7255
Here's my dataset
body customer_id name
14828 Thank you to apply to us. 5458 Sender A
23117 Congratulation your application is accepted. 5136 Sender B
23125 Your OTP will expire in 10 minutes. 5136 Sender A
Here's my code
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
b = a['body']
vect = CountVectorizer()
vect.fit(b)
X_vect=vect.transform(b)
pd.DataFrame(X_vect.toarray(), columns=vect.get_feature_names())
The output is
10 application apply ... your
0 0 0 1 0
1 0 1 0 1
2 1 0 0 1
What I need is
body customer_id name 10 application apply ... your
14828 Thank you to apply to us. 5458 Sender A 0 0 1 0
23117 Congratulation your application is accepted. 5136 Sender B 0 1 0 1
23125 Your OTP will expire in 10 minutes. 5136 Sender A 1 0 0 1
How suppose I do this? I'm still hoping to use CountVectorizer
so I can modify the function in the future
Upvotes: 1
Views: 1435
Reputation: 862691
You can add index
to Dataframe
contructor and then join
to original df
with default left join
:
b = pd.DataFrame(X_vect.toarray(), columns=vect.get_feature_names(), index= a.index)
a = a.join(b)
Or use merge
, but need more parameters, because default is inner join
:
a = a.merge(b, left_index=True, right_index=True, how='left')
Upvotes: 3