Miraiinik
Miraiinik

Reputation: 19

Duplicating rows in pandas Python

i hope you are doing good . I have the following output :

ClassName   Bugs   HighBugs  LowBugs  NormalBugs  WMC   LOC

 Class1      4        0        1         3        34     77 
 Class2      0        0        0         0        9      45
 Class3      3        0        1         2        10     18
 Class4      0        0        0         0        44     46
 Class5      6        2        2         2        78     94

The result i want is as follow :

ClassName   Bugs   HighBugs  LowBugs  NormalBugs  WMC   LOC

 Class1      1        0        0         1        34     77
 Class1      1        0        0         1        34     77
 Class1      1        0        0         1        34     77
 Class1      1        0        1         0        34     77
 Class2      0        0        0         0        9      45
 Class3      1        0        0         1        10     18
 Class3      1        0        0         1        10     18
 Class3      1        0        1         0        10     18
 Class4      0        0        0         0        44     46
 Class5      1        0        0         1        78     94
 Class5      1        0        0         1        78     94
 Class5      1        0        1         0        78     94
 Class5      1        0        1         0        78     94
 Class5      1        1        0         0        78     94
 Class5      1        1        0         0        78     94

Little explanation , what i want is to duplicate the classes depending on the column Bugs and Bugs = HighBugs + LowBugs + NormalBugs , as you can see in the result i want is that when the classes are duplicated we have only one's and zero's depending on the number of Bugs.

Thank you in advance and have a good day you all .

Upvotes: 1

Views: 52

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195633

Try:

dfs, col_names, other_cols = (
    [],
    ["NormalBugs", "LowBugs", "HighBugs"],
    ["ClassName", "WMC", "LOC"],
)
for _, row in df.iterrows():
    if row["Bugs"] == 0:
        dfs.append(
            pd.DataFrame(
                [[0, 0, 0, *[row[c] for c in other_cols]]],
                columns=col_names + other_cols,
            )
        )

    else:
        for c in col_names:
            dfs.append(pd.DataFrame([1] * row[c], columns=[c]))
            for oc in other_cols:
                dfs[-1][oc] = row[oc]


df_out = pd.concat(dfs).fillna(0)
df_out[col_names] = df_out[col_names].astype(int)
df_out["Bugs"] = df_out[col_names].any(axis=1).astype(int)
print(
    df_out[
        ["ClassName", "Bugs", "HighBugs", "LowBugs", "NormalBugs", "WMC", "LOC"]
    ]
)

Prints:

  ClassName  Bugs  HighBugs  LowBugs  NormalBugs  WMC  LOC
0    Class1     1         0        0           1   34   77
1    Class1     1         0        0           1   34   77
2    Class1     1         0        0           1   34   77
0    Class1     1         0        1           0   34   77
0    Class2     0         0        0           0    9   45
0    Class3     1         0        0           1   10   18
1    Class3     1         0        0           1   10   18
0    Class3     1         0        1           0   10   18
0    Class4     0         0        0           0   44   46
0    Class5     1         0        0           1   78   94
1    Class5     1         0        0           1   78   94
0    Class5     1         0        1           0   78   94
1    Class5     1         0        1           0   78   94
0    Class5     1         1        0           0   78   94
1    Class5     1         1        0           0   78   94

EDIT: Added more columns.

Upvotes: 1

Henry Ecker
Henry Ecker

Reputation: 35686

We can try finding the max value in a given row using DataFrame.max on axis=1, then use Index.repeat to scale up the DataFrame based on the maximal value in a given Class. Lastly, we can count the number of rows per group using groupby cumcount and compare where the current value is DataFrame.gt the group row number:

cols = df.columns[df.columns.str.endswith('Bugs')]
df = df.loc[
    df.index.repeat(df[cols].max(axis=1).clip(lower=1))
].reset_index(drop=True)
df[cols] = df[cols].gt(df.groupby('ClassName').cumcount(), axis=0).astype(int)

df:

   ClassName  Bugs  HighBugs  LowBugs  NormalBugs
0     Class1     1         0        1           1
1     Class1     1         0        0           1
2     Class1     1         0        0           1
3     Class1     1         0        0           0
4     Class2     0         0        0           0
5     Class3     1         0        1           1
6     Class3     1         0        0           1
7     Class3     1         0        0           0
8     Class4     0         0        0           0
9     Class5     1         1        1           1
10    Class5     1         1        1           1
11    Class5     1         0        0           0
12    Class5     1         0        0           0
13    Class5     1         0        0           0
14    Class5     1         0        0           0

Setup:

import pandas as pd

df = pd.DataFrame({
    'ClassName': {0: 'Class1', 1: 'Class2', 2: 'Class3', 3: 'Class4',
                  4: 'Class5'},
    'Bugs': {0: 4, 1: 0, 2: 3, 3: 0, 4: 6},
    'HighBugs': {0: 0, 1: 0, 2: 0, 3: 0, 4: 2},
    'LowBugs': {0: 1, 1: 0, 2: 1, 3: 0, 4: 2},
    'NormalBugs': {0: 3, 1: 0, 2: 2, 3: 0, 4: 2}
})

Column filter:

cols = df.columns[df.columns.str.endswith('Bugs')]

Index(['Bugs', 'HighBugs', 'LowBugs', 'NormalBugs'], dtype='object')

Max value per row (to repeat):

df[cols].max(axis=1).clip(lower=1)

0    4
1    1
2    3
3    1
4    6
dtype: int64

Scaled DataFrame:

df = df.loc[
    df.index.repeat(df[cols].max(axis=1).clip(lower=1))
].reset_index(drop=True)

   ClassName  Bugs  HighBugs  LowBugs  NormalBugs
0     Class1     4         0        1           3
1     Class1     4         0        1           3
2     Class1     4         0        1           3
3     Class1     4         0        1           3
4     Class2     0         0        0           0
5     Class3     3         0        1           2
6     Class3     3         0        1           2
7     Class3     3         0        1           2
8     Class4     0         0        0           0
9     Class5     6         2        2           2
10    Class5     6         2        2           2
11    Class5     6         2        2           2
12    Class5     6         2        2           2
13    Class5     6         2        2           2
14    Class5     6         2        2           2

Group Rows:

df.groupby('ClassName').cumcount()

0     0
1     1
2     2
3     3
4     0
5     0
6     1
7     2
8     0
9     0
10    1
11    2
12    3
13    4
14    5
dtype: int64

Comparison to convert numbers to binary

df[cols].gt(df.groupby('ClassName').cumcount(), axis=0)

     Bugs  HighBugs  LowBugs  NormalBugs
0    True     False     True        True
1    True     False    False        True
2    True     False    False        True
3    True     False    False       False
4   False     False    False       False
5    True     False     True        True
6    True     False    False        True
7    True     False    False       False
8   False     False    False       False
9    True      True     True        True
10   True      True     True        True
11   True     False    False       False
12   True     False    False       False
13   True     False    False       False
14   True     False    False       False

Upvotes: 1

Related Questions