Reputation: 101
I have a large dataset that I had to clean. Now, simplifying, I have this:
A B C D
1 1 5 2 2
4 2 5 3 1
5 3 3 2 1
8 4 1 4 4
So, the values for each column goes from 1 to 5. Now I want to transform this 4 columns in 5 dummy columns and count at the same time the amount of "values" for each row of each value, in order to have that:
S_1 S_2 S_3 S_4 S_5
1 1 2 0 0 1
4 1 1 1 0 1
5 1 1 2 0 0
8 1 0 0 3 0
So "S_1" represents the amount of "1" for each row, "S_2" the amount of "2" of each row, and so on.
I guess this is possible with a pivot table, but I can't do it. Can anybody help me, please?
Upvotes: 4
Views: 621
Reputation: 61910
One approach is to use collections.Counter:
import pandas as pd
from collections import Counter
data = [[1, 5, 2, 2],
[2, 5, 3, 1],
[3, 3, 2, 1],
[4, 1, 4, 4]]
df = pd.DataFrame(data=data, columns=['A', 'B', 'C', 'D'], index=[1, 4, 5, 8])
total = {k: 0 for k in range(1, 6)}
result = pd.DataFrame([{**total, **Counter(row)} for row in df.values], index=df.index)
result = result.rename(columns={k: f'S_{k}' for k in total}).fillna(0)
print(result)
Output
S_1 S_2 S_3 S_4 S_5
1 1 2 0 0 1
4 1 1 1 0 1
5 1 1 2 0 0
8 1 0 0 3 0
Use Counter to count the occurrences, the expression:
{**total, **Counter(row)}
creates a dictionary with 0
count for the missing values.
Upvotes: 2
Reputation: 171
You can try with this, hope this helps
import pandas as pd
from collections import defaultdict # Initialize a dictionary with a default value
df = pd.DataFrame(
[[1,5,2,2],
[2,5,3,1],
[3,3,2,1],
[4,1,4,4]]
, columns = ['A','B','C','D'])
categories = [1,2,3,4,5]
# Count per row
rows_counts = []
for idx in df.index:
dict_counts = defaultdict(int)
# Count for each category
for category in categories:
# Get row as list to count()
row = df.loc[idx,:].tolist()
# Count
dict_counts[category] = row.count(category)
# Append results
rows_counts.append(dict_counts)
# Get desired output
new_df = pd.DataFrame(rows_counts)
new_df.columns = ['S_'+ str(cat) for cat in new_df.columns]
Upvotes: 0