Kiran
Kiran

Reputation: 159

Using python pandas how to do some analysis for identify valid mobile numbers

I have got daily MIS Fields are " Name,Number and Location ". Now, Total I have 100 Rows data daily basis and I have to first check that the numbers are in 10 digit or not, if Number fields are 1 to 9 digit , i have to remove that entry in my MIS,

only valid number like 10 digit and additional +91 before 10 digit number is valid. so, in excel i have to daily remove that invalid numbers and all its manually.

next i have to send it to valid number in 2 branches.50% valid number in 1st branch and 50% valid number in 2nd branch,

In 1st branch there are two persons, so again I have to send to both person equally valid number data entry. So, For example : if out of 100 data rows, total valid number is 60 , Then in 1st branch total 30 valid numbers occurs, and each two person get 15-15 numbers.

In 2nd branch there are three persons, valid 30 numbers occurs and each three get 10-10-10 numbers.

Any help it will grateful.

Here is my code.

import pandas as pd
import numpy as np
df = pd.read_csv('/home/desktop/Desktop/MIS.csv')
df
      Name        Number Location
0   Jayesh        980000     Pune
1     Ajay    9890989090   Mumbai
2   Manish    9999999999     Pune
3   Vikram  919000000000     Pune
4  Prakash  919999999999   Mumbai
5   Rakesh  919999999998   Mumbai
6   Naresh          9000     Pune


df['Number']=df['Number'].astype(str).apply(lambda x: np.where((len(x)<=10)))

Upvotes: 2

Views: 1494

Answers (5)

gosuto
gosuto

Reputation: 5741

I suggest to use the following regex pattern:

^\+91\d{10}$|^91\d{10}$|^\d{10}$

This is assuming there are no spaces and/or brackets in your Number column. The pattern makes sure the digit part is always 10 long (no more no less) and lets it be preceded by either +91 or 91.

to build a filtered dataframe you would then:

dff = df[df['Number'].astype(str).str.match(r'^\+91\d{10}$|^91\d{10}$|^\d{10}$')]

Upvotes: 0

Karn Kumar
Karn Kumar

Reputation: 8826

If the data corresponds likely as given in example then below should work for you as per your requirement.

DataFrame:

>>> df
      Name        Number Location
0   Jayesh        980000     Pune
1     Ajay    9890989090   Mumbai
2   Manish    9999999999     Pune
3   Vikram  919000000000     Pune
4  Prakash  919999999999   Mumbai
5   Rakesh  919999999998   Mumbai
6   Naresh          9000     Pune

Result:

using str.match:

>>> df[df.Number.astype(str).str.match(r'^(\d{10}|\d{12})$', as_indexer=True)]
      Name        Number Location
1     Ajay    9890989090   Mumbai
2   Manish    9999999999     Pune
3   Vikram  919000000000     Pune
4  Prakash  919999999999   Mumbai
5   Rakesh  919999999998   Mumbai

OR

>>> df[df.Number.astype(str).str.match(r'^[0-9]{10,12}$', as_indexer=True)]
      Name        Number Location
1     Ajay    9890989090   Mumbai
2   Manish    9999999999     Pune
3   Vikram  919000000000     Pune
4  Prakash  919999999999   Mumbai
5   Rakesh  919999999998   Mumbai

Upvotes: 1

jpp
jpp

Reputation: 164823

It's tempting to convert your numbers to strings and then perform your comparisons. However, this isn't necessary and will typically be inefficient. You can use regular Boolean comparisons with a direct algorithm:

m1 = (np.log10(df['Number']).astype(int) + 1) == 12
m2 = (df['Number'] // 10**10) == 91

df_filtered = df[m1 & m2]

print(df_filtered)

      Name        Number Location
3   Vikram  919000000000     Pune
4  Prakash  919999999999   Mumbai
5   Rakesh  919999999998   Mumbai

Upvotes: 3

Vivek Kalyanarangan
Vivek Kalyanarangan

Reputation: 9081

Use -

df['Number'].astype(str).str.match(r'(\+)*(91)*(\d{10})')

Output

0    False
1     True
2     True
3     True
4     True
5     True
6    False
Name: Number, dtype: bool

Update

Use this bool series to filter -

df_filtered = df[df['Number'].astype(str).str.match(r'(\+)*(91)*(\d{10})', as_indexer=True)]


Name    Number  Location
1   Ajay    9890989090  Mumbai
2   Manish  9999999999  Pune
3   Vikram  919000000000    Pune
4   Prakash 919999999999    Mumbai
5   Rakesh  919999999998    Mumbai

Upvotes: 3

anky
anky

Reputation: 75140

for assigning nan to str which doesnot start with 91 and is less than 10 digits:

df.Number[(~df.Number.str.startswith('91',na=False))&[len(df.Number[i])!= 10 for i in df.index]] = np.nan

Upvotes: 1

Related Questions