Larry Cai
Larry Cai

Reputation: 60093

Pandas: How to add one extra column to indicate if there is nan data

i read the csv data into pandas like below, students got one score for every day. I want to add one extra column as "all_attendance" as extra score.

import pandas as pd
import numpy as np

data = np.array([['','day1','day2','day3','day4','day5'],
                ['larry',1,4,7,3,5],
                ['niko',2,-1,3,np.nan,4],
                ['tin',np.nan,5,5, 6,7]])
                
df = pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:])
print(df) 

output

      day1 day2 day3 day4 day5
larry    1    4    7    3    5
niko     2   -1    3  nan    4
tin    nan    5    5    6    7

I want to get result below, 1 if student had score every day, ´0´ is there is nan exists

      day1 day2 day3 day4 day5 all_attendance
larry    1    4    7    3    5              1
niko     2   -1    3  nan    4              0
tin    nan    5    5    6    7              0

Upvotes: 1

Views: 106

Answers (3)

 data = np.array([['','day1','day2','day3','day4','day5'],
            ['larry',1,4,7,3,5],
            ['niko',2,-1,3,np.nan,4],
            ['tin',np.nan,5,5, 6,7]])
            
 df = pd.DataFrame(data=data[1:,1:],
              index=data[1:,0],
              columns=data[0,1:])

 columns=df.columns
 for key,item in df.iterrows():
      for column in columns:
          if item[column]=='nan':
              df.loc[key,column]=0

 [df[column].astype(int) for column in columns if column!='']
 print(df)
 df['all_attendance']=0
 for key,row in df.iterrows():
      found=0
      for value in row[columns]:
          if value==0:
              found=1
              break

     if found==1:
         df.loc[key,'all_attendance']=0
     else:
         df.loc[key,'all_attendance']=1

 print(df)
 output:
      day1 day2 day3 day4 day5  all_attendance  
larry    1    4    7    3    5               1
niko     2   -1    3    0    4               0
tin      0    5    5    6    7               0

Upvotes: -1

anky
anky

Reputation: 75110

You can replace the string 'nan' with np.nan and then check if all the columns for a row is notna using df.all() on axis=1

df['all_attendance'] = df.replace('nan',np.nan).notna().all(1).astype(int)

Or:

df['all_attendance'] = df.ne('nan').all(1).astype(int)

      day1 day2 day3 day4 day5  all_attendance
larry    1    4    7    3    5               1
niko     2   -1    3  nan    4               0
tin    nan    5    5    6    7               0

Upvotes: 2

Hamish Gibson
Hamish Gibson

Reputation: 256

You can use an apply() function to achieve this result. Please see below:

def f(row):
    if 'nan' in row.values:
        return 0
    else:
        return 1

df['all_attendance'] = df.apply(f, axis=1)

Upvotes: 0

Related Questions