Reputation: 19
i hope you are doing good . I have the following output :
ClassName Bugs HighBugs LowBugs NormalBugs WMC LOC
Class1 4 0 1 3 34 77
Class2 0 0 0 0 9 45
Class3 3 0 1 2 10 18
Class4 0 0 0 0 44 46
Class5 6 2 2 2 78 94
The result i want is as follow :
ClassName Bugs HighBugs LowBugs NormalBugs WMC LOC
Class1 1 0 0 1 34 77
Class1 1 0 0 1 34 77
Class1 1 0 0 1 34 77
Class1 1 0 1 0 34 77
Class2 0 0 0 0 9 45
Class3 1 0 0 1 10 18
Class3 1 0 0 1 10 18
Class3 1 0 1 0 10 18
Class4 0 0 0 0 44 46
Class5 1 0 0 1 78 94
Class5 1 0 0 1 78 94
Class5 1 0 1 0 78 94
Class5 1 0 1 0 78 94
Class5 1 1 0 0 78 94
Class5 1 1 0 0 78 94
Little explanation , what i want is to duplicate the classes depending on the column Bugs and Bugs = HighBugs + LowBugs + NormalBugs , as you can see in the result i want is that when the classes are duplicated we have only one's and zero's depending on the number of Bugs.
Thank you in advance and have a good day you all .
Upvotes: 1
Views: 52
Reputation: 195633
Try:
dfs, col_names, other_cols = (
[],
["NormalBugs", "LowBugs", "HighBugs"],
["ClassName", "WMC", "LOC"],
)
for _, row in df.iterrows():
if row["Bugs"] == 0:
dfs.append(
pd.DataFrame(
[[0, 0, 0, *[row[c] for c in other_cols]]],
columns=col_names + other_cols,
)
)
else:
for c in col_names:
dfs.append(pd.DataFrame([1] * row[c], columns=[c]))
for oc in other_cols:
dfs[-1][oc] = row[oc]
df_out = pd.concat(dfs).fillna(0)
df_out[col_names] = df_out[col_names].astype(int)
df_out["Bugs"] = df_out[col_names].any(axis=1).astype(int)
print(
df_out[
["ClassName", "Bugs", "HighBugs", "LowBugs", "NormalBugs", "WMC", "LOC"]
]
)
Prints:
ClassName Bugs HighBugs LowBugs NormalBugs WMC LOC
0 Class1 1 0 0 1 34 77
1 Class1 1 0 0 1 34 77
2 Class1 1 0 0 1 34 77
0 Class1 1 0 1 0 34 77
0 Class2 0 0 0 0 9 45
0 Class3 1 0 0 1 10 18
1 Class3 1 0 0 1 10 18
0 Class3 1 0 1 0 10 18
0 Class4 0 0 0 0 44 46
0 Class5 1 0 0 1 78 94
1 Class5 1 0 0 1 78 94
0 Class5 1 0 1 0 78 94
1 Class5 1 0 1 0 78 94
0 Class5 1 1 0 0 78 94
1 Class5 1 1 0 0 78 94
EDIT: Added more columns.
Upvotes: 1
Reputation: 35686
We can try finding the max value in a given row using DataFrame.max
on axis=1
, then use Index.repeat
to scale up the DataFrame based on the maximal value in a given Class. Lastly, we can count the number of rows per group using groupby cumcount
and compare where the current value is DataFrame.gt
the group row number:
cols = df.columns[df.columns.str.endswith('Bugs')]
df = df.loc[
df.index.repeat(df[cols].max(axis=1).clip(lower=1))
].reset_index(drop=True)
df[cols] = df[cols].gt(df.groupby('ClassName').cumcount(), axis=0).astype(int)
df
:
ClassName Bugs HighBugs LowBugs NormalBugs
0 Class1 1 0 1 1
1 Class1 1 0 0 1
2 Class1 1 0 0 1
3 Class1 1 0 0 0
4 Class2 0 0 0 0
5 Class3 1 0 1 1
6 Class3 1 0 0 1
7 Class3 1 0 0 0
8 Class4 0 0 0 0
9 Class5 1 1 1 1
10 Class5 1 1 1 1
11 Class5 1 0 0 0
12 Class5 1 0 0 0
13 Class5 1 0 0 0
14 Class5 1 0 0 0
Setup:
import pandas as pd
df = pd.DataFrame({
'ClassName': {0: 'Class1', 1: 'Class2', 2: 'Class3', 3: 'Class4',
4: 'Class5'},
'Bugs': {0: 4, 1: 0, 2: 3, 3: 0, 4: 6},
'HighBugs': {0: 0, 1: 0, 2: 0, 3: 0, 4: 2},
'LowBugs': {0: 1, 1: 0, 2: 1, 3: 0, 4: 2},
'NormalBugs': {0: 3, 1: 0, 2: 2, 3: 0, 4: 2}
})
Column filter:
cols = df.columns[df.columns.str.endswith('Bugs')]
Index(['Bugs', 'HighBugs', 'LowBugs', 'NormalBugs'], dtype='object')
Max value per row (to repeat):
df[cols].max(axis=1).clip(lower=1)
0 4
1 1
2 3
3 1
4 6
dtype: int64
Scaled DataFrame:
df = df.loc[
df.index.repeat(df[cols].max(axis=1).clip(lower=1))
].reset_index(drop=True)
ClassName Bugs HighBugs LowBugs NormalBugs
0 Class1 4 0 1 3
1 Class1 4 0 1 3
2 Class1 4 0 1 3
3 Class1 4 0 1 3
4 Class2 0 0 0 0
5 Class3 3 0 1 2
6 Class3 3 0 1 2
7 Class3 3 0 1 2
8 Class4 0 0 0 0
9 Class5 6 2 2 2
10 Class5 6 2 2 2
11 Class5 6 2 2 2
12 Class5 6 2 2 2
13 Class5 6 2 2 2
14 Class5 6 2 2 2
Group Rows:
df.groupby('ClassName').cumcount()
0 0
1 1
2 2
3 3
4 0
5 0
6 1
7 2
8 0
9 0
10 1
11 2
12 3
13 4
14 5
dtype: int64
Comparison to convert numbers to binary
df[cols].gt(df.groupby('ClassName').cumcount(), axis=0)
Bugs HighBugs LowBugs NormalBugs
0 True False True True
1 True False False True
2 True False False True
3 True False False False
4 False False False False
5 True False True True
6 True False False True
7 True False False False
8 False False False False
9 True True True True
10 True True True True
11 True False False False
12 True False False False
13 True False False False
14 True False False False
Upvotes: 1