Creating 2 new columns by iterating through every row in multiple other columns using list comprehension

Question

I have a dataframe called df that looks similar to this (except the number of 'Date' columns goes up to Date_8 and there are several hundred clients - I have simplified it here).

Client_ID  Date_1        Date_2        Date_3        Date_4
C1019876   relationship  no change     no change     no change
C1018765   no change     single        no change     no change    
C1017654   single        no change     relationship  NaN        
C1016543   NaN           relationship  no change     single
C1015432   NaN           no change     single        NaN

I want to create two new columns, first_status and last_status. first_status should equal the first given relationship status in the 4 date columns i.e. the first response that is either relationship or single, while last_status should equal the last given relationship status in the 4 date columns. The resulting df should look like this.

Client_ID  Date_1        Date_2        Date_3        Date_4        first_status  last_status
C1019876   relationship  no change     no change     no change     relationship  relationship 
C1018765   no change     single        no change     no change     single        single    
C1017654   single        no change     relationship  NaN           single        relationship   
C1016543   NaN           relationship  no change     single        relationship  single 
C1015432   NaN           no change     single        NaN           single        single

I think these two columns can be created through list comprehension, but I don't know how. For the first_status column I imagine the code would perform something like the following on every row in the df:

Find 1st Date column where a value is given (filters out NaN)
If the value = no change, go to the next Date column
If the value = relationship, first_status = relationship
If the value = single, first_status = single

For the last_status column I imagine the code would perform something like the following on every row in the df:

Find last Date column where a value is given (filters out NaN)
If the value = no change, go to the previous Date column
If the value = relationship, last_status = relationship
If the value = single, last_status = single

yatu · Accepted Answer

You can use replace no change with np.nan, and select the first and last valid values using bfill and ffill respectively:

df = df.replace('no change', np.nan)
df['first_status'] = df.bfill(axis=1).Date_1
df['last_status'] = df.loc[:,:'Date_4'].ffill(axis=1).Date_4
#df = df.fillna('no_change') # if needed

 Client_ID        Date_1        Date_2        Date_3  Date_4  first_status  \
0  C1019876  relationship           NaN           NaN     NaN  relationship   
1  C1018765           NaN        single           NaN     NaN        single   
2  C1017654        single           NaN  relationship     NaN        single   
3  C1016543           NaN  relationship           NaN  single  relationship   
4  C1015432           NaN           NaN        single     NaN        single   

    last_status  
0  relationship  
1        single  
2  relationship  
3        single  
4        single

In the case of having Date columns up to n, use df.loc[:,:'Date_n'].ffill(axis=1).Date_n for the last_status

Creating 2 new columns by iterating through every row in multiple other columns using list comprehension

Answers (2)

Related Questions