Prince Francis
Prince Francis

Reputation: 3097

Python Dataframe - Create a new column with value based on length of existing column value

I have a dataframe as follows.

d = {'ID' : ['123456789012345678', '3456789012345678']
    }
df = pd.DataFrame(d)

with output as

    ID
0   123456789012345678
1   3456789012345678

I would like to create a new column is_valid such that if the length of ID value is 18 then true else false that is, the expected output is

            ID          is_valid
0   123456789012345678  Yes
1   3456789012345678    No

Now I am using regular expression as follows

expr = '^[0-9]{18}$'
df['is_valid'] = np.where(df['ID'].str.match(expr), 'Yes', 'No')

Is there any better way to achieve it ?

Upvotes: 0

Views: 115

Answers (3)

vrana95
vrana95

Reputation: 521

Your code :

 %timeit -n 1000
 expr = '^[0-9]{18}$'
 %timeit df['is_valid'] = np.where(df['ID'].str.match(expr), 'Yes', 'No')

#320 µs ± 7.97 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Other alternative:

%timeit -n 5000
%timeit df['validation']=['True' if len(s)>=18 else 'False' for s in df['ID']]

#201 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

enter image description here

Upvotes: 2

zipa
zipa

Reputation: 27869

You can use np.where to switch between Yes and No based on condition:

df['is_valid'] = np.where(df.ID.str.len().eq(18), 'Yes', 'No')
#                   ID is_valid
#0  123456789012345678      Yes
#1    3456789012345678       No

Upvotes: 1

Chris Adams
Chris Adams

Reputation: 18647

Use Series.str.len and Series.eq:

df['is_valid'] = df.ID.str.len().eq(18)

[out]

                   ID  is_valid
0  123456789012345678      True
1    3456789012345678     False

Upvotes: 2

Related Questions