rajes95
rajes95

Reputation: 105

Python Pandas: How do I convert categorical rows into binary rows based on the value given at that index? Example below:

My initial data is:

Label Data:
        0
1       1
2       1
3       1
4       1
5       1
...    ..
11265  20
11266  20
11267  20
11268  20
11269  20

This is what I want:

[11269 rows x 1 columns]
       1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20
1       1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
2       1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
3       1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4       1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
5       1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
...    ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..
11265   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1
11266   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1
11267   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1
11268   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1
11269   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1

The way I have attempted it is to loop through all lines of the matrix as follows:

uniqueLabels = labelData[0].unique().tolist()
docNums = range(1, len(labelData) + 1)
labelMatrix = pd.DataFrame(columns=uniqueLabels, index=docNums)
labelMatrix[:] = 0

for n in docNums:
    labelMatrix[labelData[0][n]][n] += 1
        
print(labelMatrix)

Is there a more "pandasic" way of approaching this where I don't loop through every row? This is working for now, but I actually have millions of more rows of data and it takes longer than I would like. Thanks for your help!

SOLUTION: I ended up using the following and it worked great:

labelMatrix = pd.get_dummies(labelData[0])

Upvotes: 1

Views: 333

Answers (2)

joaoavf
joaoavf

Reputation: 1383

Should be pretty straightforward:

pd.get_dummies(df['Data'])

Upvotes: 2

BENY
BENY

Reputation: 323346

You can do crosstab

pd.crosstab(df.index,df['0'])

Upvotes: 1

Related Questions