Reputation: 3
I have a dataframe with several columns of user information where I have the columns "Contact 1" and "Contact 2".
d= {'Contact 1': ['1234567891 1234567891', '12345678 12345678', '12345678 1234567891', '1234567891 12345678','1234567 1234567891',
'1234567891','123456789 12345678911', None],
'Contact 2': [None, None, None, None, None, '12345678', None, None]}
df = pd.DataFrame(data=d)
Contact 1 | Contact 2 |
---|---|
1234567891 1234567891 | None |
12345678 12345678 | None |
12345678 1234567891 | None |
1234567891 12345678 | None |
1234567 1234567891 | None |
1234567891 | 12345678 |
123456789 12345678911 | None |
None | None |
I want to split the "Contact 1" column based on the space between numbers only if the contact numbers are 8 or 10 digits followed by space, then 8 or 10 digits. This while also preserving the few information I have on "Contact 2" column.
I tried the following code:
df[['Contact 1', 'Contact 2']]=df['Contact 1'].str.split(r'(?<=^\d{8}|\d{10})\s(?=\d{8}|\d{10}$)', n=1, expand=True)
but I get the error "re.error: look-behind requires fixed-width pattern"
I would like to get the following result:
Contact 1 | Contact 2 |
---|---|
1234567891 | 1234567891 |
12345678 | 12345678 |
12345678 | 1234567891 |
1234567891 | 12345678 |
1234567 1234567891 | None |
1234567891 | 12345678 |
123456789 12345678911 | None |
None | None |
Upvotes: 0
Views: 88
Reputation: 521194
Using str.extract
:
df["Contact 2"] = np.where(df["Contact 2"].isnull(),
df["Contact 1"].str.extract(r'^\d{8,10} (\d{8,10})$'),
df["Contact 2"])
Also we need to update the first column:
df["Contact 1"] = df["Contact 1"].str.replace(r'^(\d{8,10}) \d{8,10}$', r'\1')
Upvotes: 2
Reputation: 16147
If you are interested in a non-regex solution:
Create a mask or rows that meet your conditions
m = df['Contact 1'].str.split().apply(lambda x: all([len(n) in [8,10] for n in x]))
Update df with the split/expanded values
df.update(df.loc[m]['Contact 1'].str.split(expand=True).rename(columns={0:'Contact 1',
1:'Contact 2'}), overwrite=True)
Upvotes: 0