Reputation: 71
I need to build two new columns by list boolean. This is for mobile data classification.
sample data:
mobile_phone
85295649956
85398745632
8612345678945
34512654
There is my code:
import csv
import re
import pandas as pd
import numpy as np
df = pd.read_csv('test.csv',delimiter='|',dtype = str)
a = r'852[4-9]|853[4-9]|86'
print(list(map(lambda x: bool(re.match(a, x)), df['mobile_phone'])))
Now my response is:
[True,True,True,False]
I can list the boolean but I don't know how can I use this.
I tried something like this:
import csv
import re
import pandas as pd
import numpy as np
df = pd.read_csv('test.csv',delimiter='|',dtype = str)
a = r'852[4-9]|853[4-9]|86'
df['mobile'] = np.where(
(lambda x: bool(re.match(a, x)), df['mobile_phone']) = True
,df['mobile_phone']
,nan
)
df['phone'] = np.where(
(lambda x: bool(re.match(a, x)), df['mobile_phone']) = True,
nan,
df['mobile_phone']
)
I tried to use np.where
but this can't work. Because this show me the error keyword can't be an experession
How can I show the result like this?
Desired result:
mobile_phone mobile phone
85295649956 85295649956 nan
85398745632 85398745632 nan
8612345678945 8612345678945 nan
34512654 nan 34512654
Upvotes: 0
Views: 63
Reputation: 147166
You could just use Series.apply
to process your values into new columns. For example:
import pandas as pd
import re
import math
df = pd.DataFrame({'mobile_phone': ['85295649956', '85398745632', '8612345678945', '34512654', '54861245'] })
a = r'852[4-9]|853[4-9]|86'
df['mobile'] = df['mobile_phone'].apply(lambda p: p if re.match(a, p) else math.nan)
df['phone'] = df['mobile_phone'].apply(lambda p: math.nan if re.match(a, p) else p)
df
Output:
mobile_phone mobile phone
0 85295649956 85295649956 NaN
1 85398745632 85398745632 NaN
2 8612345678945 8612345678945 NaN
3 34512654 NaN 34512654
4 54861245 NaN 54861245
Upvotes: 1