Getting the content of a pandas row based on some conditions of other row

Question

I have a pandas DataFrame df1 with the following content:

Serial N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            
   B              11            15
   C              12            11
   C                            9
   C              12            13
   C              12           
   D               3             4

I would like to count the number of occurrences of of each serial unique serial. If the serial number is less than 2, I would like to replace year and current for that row to nan. I would like to have something like this:

Serial N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            
   B              11            15
   C              12            11
   C                             9
   C              12            13
   C              12 
   D              nan           nan

root · Accepted Answer

You can combine value_counts, lt and reindex to get a boolean array of where to change values to nan, and then use loc to make the changes.

serial_filter = df1['Serial N'].value_counts().lt(2).reindex(df1['Serial N'])
df1.loc[serial_filter.values, ['year', 'current']] = np.nan

The resulting output:

  Serial N  year  current
0        B  10.0     14.0
1        B  10.0     16.0
2        B  11.0     10.0
3        B  11.0      NaN
4        B  11.0     15.0
5        C  12.0     11.0
6        C   NaN      9.0
7        C  12.0     13.0
8        C  12.0      NaN
9        D   NaN      NaN

Getting the content of a pandas row based on some conditions of other row

Answers (2)

Setup

Solution

Demonstration and Explanation

Related Questions