sooaran
sooaran

Reputation: 193

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()?

I'm noob with pandas, and recently, I got that 'ValueError' when I'm trying to modify the columns that follows some rules, as:

csv_input = pd.read_csv(fn, error_bad_lines=False)
if csv_input['ip.src'] == '192.168.1.100':
        csv_input['flow_dir'] = 1  
        csv_input['ip.src'] = 1  
        csv_input['ip.dst'] = 0
    else:
        if csv_input['ip.dst'] == '192.168.1.100':
            csv_input['flow_dir'] = 0  
            csv_input['ip.src'] = 0
            csv_input['ip.dst'] = 1  

I was searching about this error and I guess that it's because the 'if' statement and the '==' operator, but I don't know how to fix this.

Thanks!

Upvotes: 1

Views: 672

Answers (1)

greg_data
greg_data

Reputation: 2293

So Andrew L's comment is correct, but I'm going to expand on it a bit for your benefit.

When you call, e.g.

csv_input['ip.dst'] == '192.168.1.100'

What this returns is a Series, with the same index as csv_input, but all the values in that series are boolean, and represent whether the value in csv_input['ip.dst'] for that row is equal to '192.168.1.100'.

So, when you call

 if csv_input['ip.dst'] == '192.168.1.100':

You're asking whether that Series evaluates to True or False. Hopefully that explains what it meant by The truth value of a Series is ambiguous., it's a Series, it can't be boiled down to a boolean.

Now, what it looks like you're trying to do is set the values in the flow_dir,ip.src & ip.dst columns, based on the value in the ip.src column.

The correct way to do this is would be with .loc[], something like this:

#equivalent to first if statement
csv_input.loc[
    csv_input['ip.src'] = '192.168.1.100',
    ('ip.src','ip.dst','flow_dir')
] = (1,0,1)

#equivalent to second if statement    
csv_input.loc[
    csv_input['ip.dst'] = '192.168.1.100',
    ('ip.src','ip.dst','flow_dir')
] = (0,1,0)

Upvotes: 0

Related Questions