Reputation: 365
I have data that looks like this that I have imported using pandas into python:
new_id code
001722 A01.09
001723 A01.09
001723 A02.0
001724 A02.9
001724 A03.9
001725 A03.9
And I would like to convert this to a matrix of 1's and 0's for presence/absence, assuming also using pandas, to look this this:
new_id A01.09 A02.0 A02.9 A03.9
001722 1 0 0 0
001723 1 1 0 0
001724 0 0 1 1
001725 0 0 0 1
I have to do this for a different types of data. "code" here is just one type of data I need to convert. So, once I have these matrices, I would also like to merge them into one large matrix using "new_id" as the merging column, appending the new data to the right of the matrix.
I have done this in R, however, I need to do it in python for a number of reasons and am new to the language.
Upvotes: 1
Views: 882
Reputation: 2534
You might want to use pivot_table:
df['COUNT'] = 1
print(df.pivot_table('COUNT', index='new_id', columns="code").fillna(0))
(though if you want to stick to integers you will have to append '.astype(int)' to the last line : before filling datas with 0, you will get "1" and "NaN" values which are translated to float64 because of the "NaN")
Upvotes: 1