The Singularity
The Singularity

Reputation: 2698

Using Regex to extract Data to different Columns in Pandas

I'm working with the following DataFrame column containing Date |TimeStamp | Name | Message as a string

59770        [08/10/18, 5:57:43 PM] Luke: Message
59771   [08/10/18, 5:57:48 PM] Luke: Message
59772     [08/10/18, 5:57:50 PM] Luke: Message

I use the following function to capture the Date.

def getdate(x):
    res = re.search("\d\d/\d\d/\d\d",x)

and the following code to capture the rest of the data (TimeStamp | Name | Message) into columns:

df['Data'].str.extract(r'\s*(.{10})](.*):(.*)')

Is there a workaround to capture and extract all 4 entities together?

Please Advise

Upvotes: 2

Views: 699

Answers (2)

n1colas.m
n1colas.m

Reputation: 3989

As an alternative you could use regex named groups together with pandas extractall.

import pandas as pd
import re

df = pd.DataFrame(
    ["        [08/10/18, 5:57:43 PM] Luke: Message",
     "   [08/10/18, 5:57:48 PM] Luke: Message",
     "     [08/10/18, 5:57:50 PM] Luke: Message"])

print(df)

regex = re.compile(\
    r"(?P<date>\d{2}/\d{2}/\d{2}),\s*"
    r"(?P<timestamp>\d+:\d+:\d+\s[AP]M)\]\s+"
    r"(?P<name>.+?):\s*"
    r"(?P<message>.+)$"
    )

df_out = df[0].str.extractall(regex).droplevel(1)
print(df_out)

Output from df_out

       date   timestamp  name  message
0  08/10/18  5:57:43 PM  Luke  Message
1  08/10/18  5:57:48 PM  Luke  Message
2  08/10/18  5:57:50 PM  Luke  Message

Upvotes: 2

Farhad Sarvari
Farhad Sarvari

Reputation: 1081

I change the format of each line as follow in file "file.csv" :

08/10/18, 5:57:43 PM, Luke, Message

And then used from this code to read it as data frame :

 import pandas as pd
 df = pd.read_csv("file.csv")
 print (df)

OutPut:

  Date         time   name       msg

0 08/10/18 5:57:43 PM Luke Message

Suppose your data is in file "file_data.txt" as follow format:

  [08/10/18, 5:57:43 PM] Luke: Message
  [08/10/18, 5:57:48 PM] Luke: Message
  [08/10/18, 5:57:50 PM] Luke: Message

you can use from thease sed commands to convert data to csv :

 sed -i "s/]/,/"  file_data.txt
 sed -i "s/\[//"  file_data.txt
 sed -i "s/:/,/"  file_data.txt  

Upvotes: 0

Related Questions