Rachuri Ravi Teja
Rachuri Ravi Teja

Reputation: 33

Split the text and making new columns based on the text

I have a dataframe with a text column that I would like to split into multiple columns since the text string contains multiple variables, such as

df = pd.read_csv('C:/Users/mydata.csv')
print(df)
  my_msg
1 "Acct:XXXXXX0000 Debit:NGN2,000.00 Details:ATM CASH WITHDRAWAL"
2 "Acct:XXXXXX0000 Credit:NGN135,000.00 Details:BY UE Date:03-05-2019 10:03 
    Available Bal:NGN135,454.78 Enquiries:123456"
3 "Prepaid Card Alert **** POS  : Dr  NGN 4,052.50 Desc: 
   GTB/*******330/*****939 9000005600 NDate: 03-05-2019 09:36 Bal : NGN 
   506,265.00 FEEDBACK? Call 123456"
4 "Acct:XXXXXX0001 Debit:NGN300.00 Details:MOBILE BANKING300.00 
   IRTIME********7061 Date:03-05-2019 00:09 Available Bal:NGN373,358.56 
   Enquiries:12346"



I'm expecting an output in following way

Acct    Debit   Credit  Bal Pos
0000    2000    NA      NA  NA
0000    NA  135000   135454 NA
NA      NA      NA   506265 4052
0001    300     NA   373358 NA

Upvotes: 0

Views: 49

Answers (1)

Rakesh
Rakesh

Reputation: 82755

Using regex.

Ex:

import pandas as pd

df = pd.DataFrame({"Col": ["Acct:XXXXXX0000 Debit:NGN2,000.00 Details:ATM CASH WITHDRAWAL",
                           "Acct:XXXXXX0000 Credit:NGN135,000.00 Details:BY UE Date:03-05-2019 10:03 Available Bal:NGN135,454.78 Enquiries:123456",
                           "Prepaid Card Alert **** POS  : Dr  NGN 4,052.50 Desc: GTB/*******330/*****939 9000005600 NDate: 03-05-2019 09:36 Bal : NGN 506,265.00 FEEDBACK? Call 123456",
                           "Acct:XXXXXX0001 Debit:NGN300.00 Details:MOBILE BANKING300.00 IRTIME********7061 Date:03-05-2019 00:09 Available Bal:NGN373,358.56 Enquiries:12346"
                           ]})

df["Acct"] = df["Col"].str.extract(r"Acct\s*:\s*XXXXXX(\d+)\s+")
df["Debit"] = df["Col"].str.extract(r"Debit\s*:\s*NGN\s*([0-9,\.]+)\s+")
df["Credit"] = df["Col"].str.extract(r"Credit\s*:\s*NGN\s*([0-9,\.]+)\s+")
df["Bal"] = df["Col"].str.extract(r"Bal\s*:\s*NGN\s*([0-9,\.]+)\s+")
df["Pos"] = df["Col"].str.extract(r"POS\s*:\s*Dr\s*NGN\s*([0-9,\.]+)\s+")

print(df)

Output:

                                                 Col  Acct     Debit  \
0  Acct:XXXXXX0000 Debit:NGN2,000.00 Details:ATM ...  0000  2,000.00   
1  Acct:XXXXXX0000 Credit:NGN135,000.00 Details:B...  0000       NaN   
2  Prepaid Card Alert **** POS  : Dr  NGN 4,052.5...   NaN       NaN   
3  Acct:XXXXXX0001 Debit:NGN300.00 Details:MOBILE...  0001    300.00   

       Credit         Bal       Pos  
0         NaN         NaN       NaN  
1  135,000.00  135,454.78       NaN  
2         NaN  506,265.00  4,052.50  
3         NaN  373,358.56       NaN  

Upvotes: 1

Related Questions