Reputation: 2894
I have data streaming in the following format:
from StringIO import StringIO
data ="""\
ANI/IP
sip:[email protected]
sip:10.66.7.34@6665554444
sip:[email protected]
"""
import pandas as pd
df = pd.read_table(StringIO(data),sep='\s+',dtype='str')
What I would like to do is replace the column content with just the phone number part of the string above. I tried the suggestions from this thread like so:
df['ANI/IP'] = df['ANI/IP'].str.replace(r'\d{10}', '').astype('str')
print(df)
However, this results in:
.....print(df)
ANI/IP
0 sip:@10.94.2.15
1 sip:@10.66.7.34
2 sip:@10.94.2.11
I need the phone numbers, so how do I achieve this? :
ANI/IP
0 5554447777
1 6665554444
2 3337775555
Upvotes: 1
Views: 595
Reputation: 76947
You could use pandas.core.strings.StringMethods.extract
to extract
In [10]: df['ANI/IP'].str.extract("(\d{10})")
Out[10]:
0 5554447777
1 6665554444
2 3337775555
Name: ANI/IP, dtype: object
Upvotes: 1
Reputation: 375685
The regex \d{10}
searches for substring of digits precisely 10 characters long.
df['ANI/IP'] = df['ANI/IP'].str.replace(r'\d{10}', '').astype('str')
This removes the numbers!
Note: You shouldn't do astype str (it's not needed and there is no str dtype in pandas).
You want to extract these phone numbers:
In [11]: df["ANI/IP"].str.extract(r'(\d{10})') # before overwriting!
Out[11]:
0 5554447777
1 6665554444
2 3337775555
Name: ANI/IP, dtype: object
Set this as another column and you're away:
In [12]: df["phone_number"] = df["ANI/IP"].str.extract(r'(\d{10})')
Upvotes: 4