Dance Party
Dance Party

Reputation: 3713

Pandas add empty rows as filler

Given the following data frame:

import pandas as pd
DF = pd.DataFrame({'COL1': ['A', 'A','B'], 
                   'COL2' : [1,2,1],
                   'COL3' : ['X','Y','X']})

DF

  COL1  COL2   COL3
0   A    1      X
1   A    2      Y
2   B    1      X

I would like to have an additional row for COL1 = 'B' so that both values (COL1 A and B) are represented by the COL3 values X and Y, with a 0 for COL2 in the generated row.

The desired result is as follows:

  COL1  COL2   COL3
0   A    1      X
1   A    2      Y
2   B    1      X
3   B    0      Y

This is just a simplified example, but I need a calculation that could handle many such instances (and not just inserting the row in interest manually).

Thanks in advance!

UPDATE:

For a generalized scenario where there are many different combinations of values under 'COL1' and 'COL3', this works but is probably not nearly as efficient as it can be:

#Get unique set of COL3
COL3SET = set(DF['COL3'])
#Get unique set of COL1
COL1SET = set(DF['COL1'])
#Get all possible combinations of unique sets
import itertools
COMB=[]
for combination in itertools.product(COL1SET, COL3SET):
    COMB.append(combination)
#Create dataframe from new set:
UNQ = pd.DataFrame({'COMB':COMB})

#Split tuples into columns
new_col_list = ['COL1unq','COL3unq']
for n,col in enumerate(new_col_list):
    UNQ[col] = UNQ['COMB'].apply(lambda COMB: COMB[n])
UNQ = UNQ.drop('COMB',axis=1)

#Merge original data frame with unique set data frame
DF = pd.merge(DF,UNQ,left_on=['COL1','COL3'],right_on=['COL1unq','COL3unq'],how='outer')

#Fill in empty values of COL1 and COL3 where they did not have records
DF['COL1'] = DF['COL1unq']
DF['COL3'] = DF['COL3unq']

#Replace 'NaN's in column 2 with zeros
DF['COL2'].fillna(0, inplace=True)

#Get rid of COL1unq and COL3unq
DF.drop(['COL1unq','COL3unq'],axis=1, inplace=True)
DF

Upvotes: 0

Views: 962

Answers (1)

Alexander
Alexander

Reputation: 109526

Something like this?

col1_b_vals = set(DF.loc[DF.COL1 == 'B', 'COL3'])
col1_not_b_col3_vals = set(DF.loc[DF.COL1 != 'B', 'COL3'])
missing_vals = col1_not_b_col3_vals.difference(col1_b_vals)
missing_rows = DF.loc[(DF.COL1 != 'B') & (DF.COL3.isin(missing_vals)), :]
missing_rows['COL1'] = 'B'
missing_rows['COL2'] = 0
>>> pd.concat([DF, missing_rows], ignore_index=True)
  COL1  COL2 COL3
0    A     1    X
1    A     2    Y
2    B     1    X
3    B     0    Y

Upvotes: 1

Related Questions