How to get.dummies and count at the same time

Question

I have a large dataset that I had to clean. Now, simplifying, I have this:

   A  B  C  D
1  1  5  2  2
4  2  5  3  1
5  3  3  2  1
8  4  1  4  4

So, the values for each column goes from 1 to 5. Now I want to transform this 4 columns in 5 dummy columns and count at the same time the amount of "values" for each row of each value, in order to have that:

   S_1 S_2 S_3 S_4 S_5
1   1   2   0   0   1
4   1   1   1   0   1
5   1   1   2   0   0
8   1   0   0   3   0

So "S_1" represents the amount of "1" for each row, "S_2" the amount of "2" of each row, and so on.

I guess this is possible with a pivot table, but I can't do it. Can anybody help me, please?

Dani Mesejo · Accepted Answer

One approach is to use collections.Counter:

import pandas as pd
from collections import Counter

data = [[1, 5, 2, 2],
        [2, 5, 3, 1],
        [3, 3, 2, 1],
        [4, 1, 4, 4]]

df = pd.DataFrame(data=data, columns=['A', 'B', 'C', 'D'], index=[1, 4, 5, 8])

total = {k: 0 for k in range(1, 6)}

result = pd.DataFrame([{**total, **Counter(row)} for row in df.values], index=df.index)

result = result.rename(columns={k: f'S_{k}' for k in total}).fillna(0)

print(result)

Output

   S_1  S_2  S_3  S_4  S_5
1    1    2    0    0    1
4    1    1    1    0    1
5    1    1    2    0    0
8    1    0    0    3    0

Use Counter to count the occurrences, the expression:

{**total, **Counter(row)}

creates a dictionary with 0 count for the missing values.

How to get.dummies and count at the same time

Answers (2)

Related Questions