Parth Tiwari
Parth Tiwari

Reputation: 486

How to apply function on the basis of column condition in a dataframe

I am trying to apply a function over a column in a dataframe if one of the column i.e. df['mask'] contain False it should skip that row. mask column is bool type

this is mine function

     def dates(inp):
        temp = inp
        parser = CommonRegex()
        inp = inp.apply(parser.dates).str.join(', ')
        return np.where(inp.apply(parser.dates).str.len() == 0, temp, 'X' * random.randrange(3, 8)) 

here what i have applied

      df1.assign(**df1['Dates'].apply(dates).where(df1['mask']== TRUE))

Its throwing error

         32     temp = inp
         33     parser = CommonRegex()
    ---> 34     inp = inp.apply(parser.dates).str.join(', ')
         35     return np.where(inp.apply(parser.dates).str.len() == 0, temp, 'X' * random.randrange(3, 8))
         36 

    AttributeError: 'Timestamp' object has no attribute 'apply'    

Here is mine dataframe look like

         Name     |  Dates   |  mask |
         ..............................
         Tom      | 21/02/2018| True
         Nick     | 28/07/2018| False
         Juli     | 11/08/2018| True
         June     | 01/02/2018| True
         XHGM     | 07/08/2018| False   

I am trying to get output in this way that for false value it should skip and and for true value it should call date function and hide the data values

         Name     |  Dates   |  mask |
         ..............................
         Tom      | XXXXX     | True
         Nick     |28/07/2018 | False
         Juli     | XXXXX     | True
         June     | XXXXX     | True
         XHGM     | 07/08/2018| False     

Upvotes: 1

Views: 51

Answers (1)

jezrael
jezrael

Reputation: 862406

Use Series.pipe for pass columns to function and also filter rows with boolean indexing by mask and DataFrame.loc for specify column name:

df1.loc[df1['mask'], 'Dates'] = df1.loc[df1['mask'], 'Dates'].pipe(dates)
print (df1)
   Name       Dates   mask
0   Tom         XXX   True
1  Nick  28/07/2018  False
2  Juli         XXX   True
3  June         XXX   True
4  XHGM  07/08/2018  False

Solution with assign is possible too, but disadvantage is function loop by all values and then filtering, so if only few Trues values in large Dataframe should be slowier:

df1 = df1.assign(Dates = np.where(df1['mask'], df1['Dates'].pipe(dates), df1['Dates']))

Upvotes: 1

Related Questions