amrutha
amrutha

Reputation: 193

fill blanks in a column based on conditions pandas

I have a dataframe df:

Site    cells   Azimut  technology  npv scpci
T30264  G30264B 130     UMTS900     343 276
T30992  G30992A 10      UMTS900     171 12
T30992  G30992B 260      UMTS900    173 13
T30992  U30992A 10      UMTS2100    171 12
T30992  U30992B 260     UMTS2100    173 13
T31520  G31520A 0        UMTS900    72  500
T31520  G31520B 120      UMTS900    73  501
T31520  G31520C 220      UMTS900    74  502
T31548  G31548A 30       UMTS900    93  450
T31548  G31548B 130      UMTS900    94  451
T31548  G31548C 250      UMTS900    95  452
T31548  U31548N 30      UMTS2100    94  450
T31548  U31548O 130     UMTS2100    95  451
T31548  U31548P 250      UMTS2100   96  452

I want to add a column 'c1' to dataframe df such that I want to fill the blanks in that column based on the some conditions.

If the cellname ends with 'A' or 'N' in that particular Site, check the value for npv/scpci column which is exactly divisible by 3. Whichever value from both columns is divisible by 3, update that value to c1, and corrsponding cells ending with 'B','C','O','P' in that same Site should be updated with the same column values that 'A' or 'N' are updated. If values from both are divisible by 3, then copy npv column value.

If that site has only 1 cell regardless of ending with any alphabet, update the value which is divisible by 3.

Required Output:

Site    cells   Azimut  technology  npv scpci   final value
T30264  G30264B 130      UMTS900    343 276      276
T30992  G30992A 10       UMTS900    171 12       171
T30992  G30992B 260      UMTS900    173 13       173
T30992  U30992A 10       UMTS2100   171 12       171
T30992  U30992B 260      UMTS2100   173 13       173
T31520  G31520A 0        UMTS900    72  500       72
T31520  G31520B 120      UMTS900    73  501       73
T31520  G31520C 220      UMTS900    74  502       74
T31548  G31548A 30       UMTS900    93  450       93
T31548  G31548B 130      UMTS900    94  451       94
T31548  G31548C 250      UMTS900    95  452       95
T31548  U31548N 30       UMTS2100   94  450      450
T31548  U31548O 130     UMTS2100    95  451      451
T31548  U31548P 250     UMTS2100    96  452      452

Upvotes: 0

Views: 1540

Answers (1)

Ben.T
Ben.T

Reputation: 29635

Here is one way.Let's call your dataframe df. First work is on the case where cells ends by A or N.

# create the mask when finisig by A or N
mask_AN = (df['cells'].str[-1] == 'A') | (df['cells'].str[-1] == 'N')
# create the column final_value and write 
# 1 if the value should be from the column npv and 
#2 if the value should be from the column scpci
df.loc[mask_AN,'final_value'] = pd.np.where((df.loc[mask_AN,'scpci']%3 == 0) 
                                              & (df.loc[mask_AN,'npv']%3 != 0),2,1)

The np.where works as: the final_value should be from scpci (so 2 for now) only if the column scpci is divisible by 3 while the column npv is not, otherwise the final_value will be from npv (so 1). The next step is to fill final_value for a same site with the value where cells ends by A or N. This can be done by:

df['final_value'] = df.groupby('Site')['final_value'].ffill() # fill forward

Note that the filling works here as it seems that you have a cell ending by 'A' before one endings by 'B' or 'C' (except when unique) and same a cell ending by 'N' before one endings with 'O' and 'P'. This ffill might not work if you don't always have this configuration.

Finally, you need to do the site with unique row:

# mask of site with unique cell
df_g = df.reset_index().groupby('Site')
mask_unique = df_g.index.first()[df_g.cells.count() ==1]
# a bit on the same idea than before for adding 1 or 2 in the final_value column
df['final_value'].loc[mask_unique] = pd.np.where((df['scpci'].loc[mask_unique]%3 == 0) 
                                                   & (df['npv'].loc[mask_unique]%3 != 0),2,1)

Now that you have 1 or 2 in this final_value column, just need to replace by the value in the associated column:

df['final_value'] = pd.np.where( df['final_value'] == 1, df['npv'], df['scpci'])

The output is like expected:

      Site    cells  Azimut technology  npv  scpci  final_value
0   T30264  G30264B     130    UMTS900  343    276          276
1   T30992  G30992A      10    UMTS900  171     12          171
2   T30992  G30992B     260    UMTS900  173     13          173
3   T30992  U30992A      10   UMTS2100  171     12          171
4   T30992  U30992B     260   UMTS2100  173     13          173
5   T31520  G31520A       0    UMTS900   72    500           72
6   T31520  G31520B     120    UMTS900   73    501           73
7   T31520  G31520C     220    UMTS900   74    502           74
8   T31548  G31548A      30    UMTS900   93    450           93
9   T31548  G31548B     130    UMTS900   94    451           94
10  T31548  G31548C     250    UMTS900   95    452           95
11  T31548  U31548N      30   UMTS2100   94    450          450
12  T31548  U31548O     130   UMTS2100   95    451          451
13  T31548  U31548P     250   UMTS2100   96    452          452

Upvotes: 2

Related Questions