Reputation: 3097
I have a dataframe as follows.
d = {'ID' : ['123456789012345678', '3456789012345678']
}
df = pd.DataFrame(d)
with output as
ID
0 123456789012345678
1 3456789012345678
I would like to create a new column is_valid
such that if the length of ID value
is 18
then true
else false
that is, the expected output is
ID is_valid
0 123456789012345678 Yes
1 3456789012345678 No
Now I am using regular expression as follows
expr = '^[0-9]{18}$'
df['is_valid'] = np.where(df['ID'].str.match(expr), 'Yes', 'No')
Is there any better way to achieve it ?
Upvotes: 0
Views: 115
Reputation: 521
%timeit -n 1000
expr = '^[0-9]{18}$'
%timeit df['is_valid'] = np.where(df['ID'].str.match(expr), 'Yes', 'No')
#320 µs ± 7.97 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit -n 5000
%timeit df['validation']=['True' if len(s)>=18 else 'False' for s in df['ID']]
#201 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 2
Reputation: 27869
You can use np.where to switch between Yes
and No
based on condition:
df['is_valid'] = np.where(df.ID.str.len().eq(18), 'Yes', 'No')
# ID is_valid
#0 123456789012345678 Yes
#1 3456789012345678 No
Upvotes: 1
Reputation: 18647
Use Series.str.len
and Series.eq
:
df['is_valid'] = df.ID.str.len().eq(18)
[out]
ID is_valid
0 123456789012345678 True
1 3456789012345678 False
Upvotes: 2