David López
David López

Reputation: 529

Pandas split column in several columns throug string replacement or regex

I have a 'Column' in my dataframe that, in optimal conditions, looks like this:

Client: Stack Overflow   Order Num: 123456  Account From: 3656645654   Account to: 546546578

I want to split this column in several columns like:

'Client','Order Num', 'Account From','Account to'

But in some cases i don't have the client, order num and accounts in the column

I'am doing it in this way:

for x in len(df.columns):
   if 'Client' in df.loc[x,'Columnn']:
      df.loc[x,'Client'] = str(df.loc[x,'Column']).split('Client: ')[1]
      if 'Order Num' in df.loc[x,'Client']:
         df.loc[x,'Client'] = str(df.loc[x,'Client']).split('Order Num: ')[0]
      if 'Account From' in df.loc[x,'Client']:
         df.loc[x,'Client'] = str(df.loc[x,'Client']).split('Account From: ')[0]
      if 'Account to' in df.loc[x,'Client']:
         df.loc[x,'Client'] = str(df.loc[x,'Client']).split('Account to: ')[0]
   else:
      df.loc[x,'Client'] = ''

And so on for all the columns I want to create.

This part of the script is almost 40 lines and is very slow.

Do you have a more 'pandastic' solution?

Upvotes: 0

Views: 62

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Try this using the string accessor, .str and extract with named groups using regex:

df['col1'].str.extract('Client: (?P<Client>.*) Order Num: (?P<OrderNum>.*) Account From: (?P<AccountFrom>.*) Account to: (?P<AccountTo>.*)')

Output:

             Client OrderNum   AccountFrom  AccountTo
0  Stack Overflow    123456   3656645654    546546578

Upvotes: 2

Related Questions