Reputation: 529
I have a 'Column' in my dataframe that, in optimal conditions, looks like this:
Client: Stack Overflow Order Num: 123456 Account From: 3656645654 Account to: 546546578
I want to split this column in several columns like:
'Client','Order Num', 'Account From','Account to'
But in some cases i don't have the client, order num and accounts in the column
I'am doing it in this way:
for x in len(df.columns):
if 'Client' in df.loc[x,'Columnn']:
df.loc[x,'Client'] = str(df.loc[x,'Column']).split('Client: ')[1]
if 'Order Num' in df.loc[x,'Client']:
df.loc[x,'Client'] = str(df.loc[x,'Client']).split('Order Num: ')[0]
if 'Account From' in df.loc[x,'Client']:
df.loc[x,'Client'] = str(df.loc[x,'Client']).split('Account From: ')[0]
if 'Account to' in df.loc[x,'Client']:
df.loc[x,'Client'] = str(df.loc[x,'Client']).split('Account to: ')[0]
else:
df.loc[x,'Client'] = ''
And so on for all the columns I want to create.
This part of the script is almost 40 lines and is very slow.
Do you have a more 'pandastic' solution?
Upvotes: 0
Views: 62
Reputation: 153460
Try this using the string accessor, .str
and extract
with named groups using regex:
df['col1'].str.extract('Client: (?P<Client>.*) Order Num: (?P<OrderNum>.*) Account From: (?P<AccountFrom>.*) Account to: (?P<AccountTo>.*)')
Output:
Client OrderNum AccountFrom AccountTo
0 Stack Overflow 123456 3656645654 546546578
Upvotes: 2