Melderon
Melderon

Reputation: 365

Convert pandas dataframe into a matrix of 1's and 0's for presence/absence

I have data that looks like this that I have imported using pandas into python:

new_id  code
001722  A01.09
001723  A01.09
001723  A02.0
001724  A02.9
001724  A03.9
001725  A03.9

And I would like to convert this to a matrix of 1's and 0's for presence/absence, assuming also using pandas, to look this this:

new_id  A01.09  A02.0  A02.9  A03.9
001722       1      0      0      0
001723       1      1      0      0
001724       0      0      1      1
001725       0      0      0      1

I have to do this for a different types of data. "code" here is just one type of data I need to convert. So, once I have these matrices, I would also like to merge them into one large matrix using "new_id" as the merging column, appending the new data to the right of the matrix.

I have done this in R, however, I need to do it in python for a number of reasons and am new to the language.

Upvotes: 1

Views: 882

Answers (1)

tgrandje
tgrandje

Reputation: 2534

You might want to use pivot_table:

df['COUNT'] = 1
print(df.pivot_table('COUNT', index='new_id', columns="code").fillna(0))

(though if you want to stick to integers you will have to append '.astype(int)' to the last line : before filling datas with 0, you will get "1" and "NaN" values which are translated to float64 because of the "NaN")

Upvotes: 1

Related Questions