Retain the values only in those rows of the column based on the condition on other columns in pandas

Question

I have a dataframe df_in, which contains column names that start with pi and pm.

df_in = pd.DataFrame([[1,2,3,4,"",6,7,8,9],["",1,32,43,59,65,"",83,97],["",51,62,47,58,64,74,86,99],[73,51,42,67,54,65,"",85,92]], columns=["piabc","pmed","pmrde","pmret","pirtc","pmere","piuyt","pmfgf","pmthg"])

If a row in column name which starts with pi is blank, make the same rows of columns which starts with pm also blank till we have a new column which starts with pi. And repeat the same process for other columns also.

Expected Output:

df_out = pd.DataFrame([[1,2,3,4,"","",7,8,9],["","","","",59,65,"","",""],["","","","",58,64,74,86,99],[73,51,42,67,54,65,"","",""]], columns=["piabc","pmed","pmrde","pmret","pirtc","pmere","piuyt","pmfgf","pmthg"])

How to do it?

jezrael · Accepted Answer

You can create groups by compare columns names by str.startswith with cumulative sum and then compare values by empty spaces in groupby for mask used for set empty spaces in DataFrame.mask:

g = df_in.columns.str.startswith('pi').cumsum()
df = df_in.mask(df_in.eq('').groupby(g, axis=1).transform(lambda x: x.iat[0]), '')

#first for me failed in pandas 1.2.3
#df = df_in.mask(df_in.eq('').groupby(g, axis=1).transform('first'), '')


print (df)
  piabc pmed pmrde pmret pirtc pmere piuyt pmfgf pmthg
0     1    2     3     4                 7     8     9
1                           59    65                  
2                           58    64    74    86    99
3    73   51    42    67    54    65

Retain the values only in those rows of the column based on the condition on other columns in pandas

Answers (1)

Related Questions