Andy
Andy

Reputation: 193

pandas dataframe apply lambda index error

I have the following code

df2['TaxAccNo4'] = df2['TaxAccNo2'].apply(lambda x: x.split('.')[0])
df2['TaxAccNo3'] = df2['TaxAccNo2'].apply(lambda x: x.split('.')[1])

where df2 is:

     TaxAccNo2    
0    00001379.1   
1    00182218    

When I run the code I get

     TaxAccNo2   TaxAccNo4
0    00001379.1  00001379
1    00182218    00182218

and IndexError: list index out of range for TaxAccNo3,

     TaxAccNo2   TaxAccNo4   TaxAccNo3
0    00001379.1  00001379    1
1    00182218    00182218    

How do I fix my code to produce that output? I'm assuming its giving me the error because Index 1 doesn't have '.' but I'm not sure how to fix that.

Upvotes: 0

Views: 1025

Answers (2)

Pedro Moresco
Pedro Moresco

Reputation: 41

Hy, I was reviewing your code, the problem is that when you use the method split() in a string the returned object is a list, and this is causing the index error, as you pointed. The solution I encountered if very simple, use a conditional in your code to prevent it from calling this index for shorter lists as follows. Hope it helps.

df2['TaxAccNo3'] = df2['TaxAccNo2'].apply(lambda x: x.split('.')[1] if len(x.split('.'))>1 else x)

Upvotes: 1

dataista
dataista

Reputation: 3457

As you said, the problem is that "00182218".split(".") doesn't have a [1] index, since it's the list ["00182218"].

A simple solution without affecting too much the code is to use a ... if ... else ... ternary operator:

df2['TaxAccNo4'] = df2['TaxAccNo2'].apply(lambda x: x.split('.')[0])
df2['TaxAccNo3'] = df2['TaxAccNo2'].apply(lambda x: x.split('.')[1] if '.' in x else '')

Where the last '' is an empty string, the value with which you will fill 'TaxAccNo3' if 'TaxAccNo2' doesn't have a dot (you can replace it if you want other behaviour).

The semantic is: put x.split('.')[1] in df2['TaxAccNo3'] if x contains a dot, otherwise put an empty string.

Upvotes: 0

Related Questions