Reputation: 159
I have got daily MIS Fields are " Name,Number and Location ". Now, Total I have 100 Rows data daily basis and I have to first check that the numbers are in 10 digit or not, if Number fields are 1 to 9 digit , i have to remove that entry in my MIS,
only valid number like 10 digit and additional +91 before 10 digit number is valid. so, in excel i have to daily remove that invalid numbers and all its manually.
next i have to send it to valid number in 2 branches.50% valid number in 1st branch and 50% valid number in 2nd branch,
In 1st branch there are two persons, so again I have to send to both person equally valid number data entry. So, For example : if out of 100 data rows, total valid number is 60 , Then in 1st branch total 30 valid numbers occurs, and each two person get 15-15 numbers.
In 2nd branch there are three persons, valid 30 numbers occurs and each three get 10-10-10 numbers.
Any help it will grateful.
Here is my code.
import pandas as pd
import numpy as np
df = pd.read_csv('/home/desktop/Desktop/MIS.csv')
df
Name Number Location
0 Jayesh 980000 Pune
1 Ajay 9890989090 Mumbai
2 Manish 9999999999 Pune
3 Vikram 919000000000 Pune
4 Prakash 919999999999 Mumbai
5 Rakesh 919999999998 Mumbai
6 Naresh 9000 Pune
df['Number']=df['Number'].astype(str).apply(lambda x: np.where((len(x)<=10)))
Upvotes: 2
Views: 1494
Reputation: 5741
I suggest to use the following regex pattern:
^\+91\d{10}$|^91\d{10}$|^\d{10}$
This is assuming there are no spaces and/or brackets in your Number
column. The pattern makes sure the digit part is always 10 long (no more no less) and lets it be preceded by either +91 or 91.
to build a filtered dataframe you would then:
dff = df[df['Number'].astype(str).str.match(r'^\+91\d{10}$|^91\d{10}$|^\d{10}$')]
Upvotes: 0
Reputation: 8826
If the data corresponds likely as given in example then below should work for you as per your requirement.
DataFrame:
>>> df
Name Number Location
0 Jayesh 980000 Pune
1 Ajay 9890989090 Mumbai
2 Manish 9999999999 Pune
3 Vikram 919000000000 Pune
4 Prakash 919999999999 Mumbai
5 Rakesh 919999999998 Mumbai
6 Naresh 9000 Pune
Result:
using str.match
:
>>> df[df.Number.astype(str).str.match(r'^(\d{10}|\d{12})$', as_indexer=True)]
Name Number Location
1 Ajay 9890989090 Mumbai
2 Manish 9999999999 Pune
3 Vikram 919000000000 Pune
4 Prakash 919999999999 Mumbai
5 Rakesh 919999999998 Mumbai
OR
>>> df[df.Number.astype(str).str.match(r'^[0-9]{10,12}$', as_indexer=True)]
Name Number Location
1 Ajay 9890989090 Mumbai
2 Manish 9999999999 Pune
3 Vikram 919000000000 Pune
4 Prakash 919999999999 Mumbai
5 Rakesh 919999999998 Mumbai
Upvotes: 1
Reputation: 164823
It's tempting to convert your numbers to strings and then perform your comparisons. However, this isn't necessary and will typically be inefficient. You can use regular Boolean comparisons with a direct algorithm:
m1 = (np.log10(df['Number']).astype(int) + 1) == 12
m2 = (df['Number'] // 10**10) == 91
df_filtered = df[m1 & m2]
print(df_filtered)
Name Number Location
3 Vikram 919000000000 Pune
4 Prakash 919999999999 Mumbai
5 Rakesh 919999999998 Mumbai
Upvotes: 3
Reputation: 9081
Use -
df['Number'].astype(str).str.match(r'(\+)*(91)*(\d{10})')
Output
0 False
1 True
2 True
3 True
4 True
5 True
6 False
Name: Number, dtype: bool
Update
Use this bool
series to filter -
df_filtered = df[df['Number'].astype(str).str.match(r'(\+)*(91)*(\d{10})', as_indexer=True)]
Name Number Location
1 Ajay 9890989090 Mumbai
2 Manish 9999999999 Pune
3 Vikram 919000000000 Pune
4 Prakash 919999999999 Mumbai
5 Rakesh 919999999998 Mumbai
Upvotes: 3
Reputation: 75140
for assigning nan to str which doesnot start with 91
and is less than 10 digits:
df.Number[(~df.Number.str.startswith('91',na=False))&[len(df.Number[i])!= 10 for i in df.index]] = np.nan
Upvotes: 1