Reputation: 267
I got a big list of phone numbers in all sorts of formats:
df = pd.DataFrame(
{'phone': ['0123/12345', '0123-23456', '0123/4455-10', '0123-4455-22'],
'name': ['A-1', 'B-1', 'C/3', 'D/7']})
name phone
0 A-1 0123/12345
1 B-1 0123-23456
2 C/3 0123/4455-10
3 D/7 0123-4455-22
The formats I want are in rows #0 and #2.
When I concentrate on #1, I tried the following:
df.loc[(df.phone.str.count('-')==1) &
(df.phone.str.count('/')==0)].apply(lambda x: x.str.replace('-', '/'))
And this does the trick on the number, but unfortunately also on the name column:
name phone
1 B/1 0123/23456
But the name column must not be changed.
So I have two questions:
Upvotes: 0
Views: 6474
Reputation: 215047
You can use regex replace (str.replace
method) on column phone only:
df['phone'] = df.phone.str.replace("^(\d+)-(.*)$", r"\1/\2")
df
# name phone
#0 A-1 0123/12345
#1 B-1 0123/23456
#2 C/3 0123/4455-10
#3 D/7 0123/4455-22
Explanation on the regex:
^(\d+)-(.*)$
matches a string that starts with digits and immediately followed by dash which is the case for row #0 and row #2; With back reference, it replaces the first dash with /
and for row #1 and row #3, since they don't match the regex, no modification is applied.
Upvotes: 7
Reputation: 2424
Or if you're no fan of regex (like me), you can simply do this:
df['phone'] = df.phone.apply(lambda x: x.replace('-','/',1) if '/' not in x else x)
print(df)
name phone
0 A-1 0123/12345
1 B-1 0123/23456
2 C/3 0123/4455-10
3 D/7 0123/4455-22
Probably not the best or fastest way, still I feel more comfortable with it since I don't know regex yet.
Hope that was helpful.
Upvotes: 1