Reputation: 293
here is my example df
doc_num
doc1 doc2
A B U123
A C U123
A D U124
B C U126
B D U126
and i have use
pd.get_dummies(df.doc_num).sort_index(level=0)
to make a vector matrix like this
U123 U124 U126
doc1 doc2
A B 1 0 0
A C 1 0 0
A D 0 1 0
B C 0 0 1
B D 0 0 1
but i would like to concat the doc1 and doc2 then create a new column to see the expected result like this
U123 U124 U126
doc_3
A,B 1 0 0
A,C 1 0 0
A,D 0 1 0
B,C 0 0 1
B,D 0 0 1
is it possible? thank you in advance
Upvotes: 3
Views: 96
Reputation: 71600
In addition to @jezrael's answer, you want a vector matrix, so do:
df1=pd.get_dummies(df.doc_num)
df1.insert(0, 'doc_3', df['doc1'] + ',' + df['doc2'])
print(df1.set_index('doc_3'))
Or:
df1=pd.get_dummies(df.doc_num)
df1['doc_3']=df.pop('doc1') + ',' + df.pop('doc2')
print(df1.set_index('doc_3'))
All Output:
U123 U124 U126
doc_3
A,B 1 0 0
A,C 1 0 0
A,D 0 1 0
B,C 0 0 1
B,D 0 0 1
Now you really get your desired output.
Upvotes: 1
Reputation: 4506
You can try below code. It will combine two columns into one . Also, add "," in between them.
df['doc_3'] = df['doc1'] + "," + df['doc2']
Then you can drop first two columns
Upvotes: 0
Reputation: 863166
I believe you need join both levels of MultiIndex
, set index name by rename_axis
:
df1 = pd.get_dummies(df.doc_num).sort_index(level=0)
df1.index = df1.index.map(','.join)
df1 = df1.rename_axis('doc_3')
print (df1)
U123 U124 U126
doc_3
A,B 1 0 0
A,C 1 0 0
A,D 0 1 0
B,C 0 0 1
B,D 0 0 1
And add reset_index
for column if necessary:
df1 = df1.reset_index()
print (df1)
doc_3 U123 U124 U126
0 A,B 1 0 0
1 A,C 1 0 0
2 A,D 0 1 0
3 B,C 0 0 1
4 B,D 0 0 1
Or first reset_index
to columns from MultiIndex
with pop
for extract columns if want index:
df1 = pd.get_dummies(df.doc_num).sort_index(level=0).reset_index()
df1.index = df1.pop('doc1') + ',' + df1.pop('doc2')
df1 = df1.rename_axis('doc_3')
print (df1)
U123 U124 U126
doc_3
A,B 1 0 0
A,C 1 0 0
A,D 0 1 0
B,C 0 0 1
B,D 0 0 1
Or use insert
for new column:
df1 = pd.get_dummies(df.doc_num).sort_index(level=0).reset_index()
df1.insert(0, 'doc_3', df1.pop('doc1') + ',' + df1.pop('doc2'))
print (df1)
doc_3 U123 U124 U126
0 A,B 1 0 0
1 A,C 1 0 0
2 A,D 0 1 0
3 B,C 0 0 1
4 B,D 0 0 1
Upvotes: 0