Reputation: 172
I need a bit of help.
I'm pretty new to Python (I use version 3.0 bundled with Anaconda) and I want to use regex to validate/return a list of only valid numbers that match a criteria (say \d{11} for 11 digits). I'm getting the list using Pandas
df = pd.DataFrame(columns=['phoneNumber','count'], data=[
['08034303939',11],
['08034382919',11],
['0802329292',10],
['09039292921',11]])
When I return all the items using
for row in df.iterrows(): # dataframe.iterrows() returns tuple
print(row[1][0])
it returns all items without regex validation, but when I try to validate with this
for row in df.iterrows(): # dataframe.iterrows() returns tuple
print(re.compile(r"\d{11}").search(row[1][0]).group())
it returns an Attribute error (since the returned value for non-matching values is None.
How can I work around this, or is there an easier way?
Upvotes: 2
Views: 4575
Reputation: 402814
If you want to validate, you can use str.match
and convert to a boolean mask using df.astype(bool)
:
x = df['phoneNumber'].str.match(r'\d{11}').astype(bool)
x
0 True
1 True
2 False
3 True
Name: phoneNumber, dtype: bool
You can use boolean indexing to return only rows with valid phone numbers.
df[x]
phoneNumber count
0 08034303939 11
1 08034382919 11
3 09039292921 11
Upvotes: 5