Pari Sairam Mohan
Pari Sairam Mohan

Reputation: 421

Python updating column based on if condition

I have a list which contains two dataframes. I am trying to update the column "Dependents" based on the value of "Married" whenever it is NULL.

for dataset in data_cleaner:
    dataset[dataset.Dependents.isnull()].loc[dataset.Dependents.isnull() and dataset['Married']=='Yes' ] ='1'
    dataset[dataset.Dependents.isnull()].loc[dataset.Dependents.isnull() and dataset['Married']=='No' ] ='0'

Error:The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Tried if else condition as well and got the same error. What am I missing or not understanding here?

Upvotes: 1

Views: 91

Answers (2)

jezrael
jezrael

Reputation: 862641

Error means using and with chain boolean conditions, in pandas compare arrays, so need bitwise AND - &. Another problem are missing () around conditions for compare by Married column.

And because working with list of DataFrames, is necessary update each DataFrame with indexing - data_cleaner[i].

Notice: If need working with numeric, append 1 and 0 instead '1' and '0'.

for i in range(len(data_cleaner)):
    m1 = data_cleaner[i].Dependents.isnull()
    data_cleaner[i].loc[m1 & (data_cleaner[i]['Married']=='Yes'), 'Dependents'] ='1'
    data_cleaner[i].loc[m1 & (data_cleaner[i]['Married']=='No'), 'Dependents'] ='0'

Alternative with numpy.select:

for i in range(len(data_cleaner)):
    m1 = data_cleaner[i].Dependents.isnull()
    m2 = (data_cleaner[i]['Married']=='Yes')
    m3 = (data_cleaner[i]['Married']=='No')

    data_cleaner[i]['Dependents'] = np.select([m1 & m2, m1 & m3], 
                                              ['1','0'], 
                                              data_cleaner[i]['Dependents'])

Or create another list of DataFrames:

out = []
for dataset in data_cleaner:
    m1 = dataset.Dependents.isnull()
    dataset.loc[m1 & (dataset['Married']=='Yes'), 'Dependents'] ='1'
    dataset.loc[m1 & (dataset['Married']=='No'), 'Dependents'] ='0'
    out.append(dataset)

Upvotes: 3

xiutiqianshi
xiutiqianshi

Reputation: 209

You need to specify your column name.

for dataset in data_cleaner:
    dataset.loc[(dataset.Dependents.isnull()) & (dataset['Married']=='Yes'),'Dependents' ] ='1'
    dataset.loc[(dataset.Dependents.isnull()) & (dataset['Married']=='NO'),'Dependents' ] ='0'

Upvotes: 1

Related Questions