abhi
abhi

Reputation: 399

extract certain words from column in a pandas df

I have a pandas df in which one column is the message and having a string and have data like below:-

df['message']

2020-09-23T22:38:34-04:00 mpp-xyz-010101-10-103.vvv0x.net patchpanel[1329]: RTP:a=end pp=10.10.10.10:9999 user=sip:[email protected];tag=2021005845 lport=12270 raddr=11.00.111.212 rport=3004 d=5 arx=0.000 tx=0.000 fo=0.000 txf=0.000 bi=11004 bo=453 pi=122 pl=0 ps=0 rtt="" font=0 ua=funny-SDK-4.11.2.34441.fdc6567fW jc=10 no-rtp=0 cid=2164444 relog=0 vxdi=0 vxdo=0 vxdr=0\n

So I want to extract the raddr from the data and join it back to the df. I am doing it with the code below and thought that its on position 7 after the split:-

df[['raddr']]=df['message'].str.split(' ', 100, expand=True)[[7]]
df['raddr']=df['raddr'].str[6:]

the issue is in some columns it's coming at 8 and some at 7 so in some columns, it gives me a report and not radar because of the issue.

How can I extract that so that it will extract it on a string search and not using split?

Note:- Also, I want a faster approach as I am doing in on hunters of thousands of records every minute.

Upvotes: 0

Views: 391

Answers (2)

RichieV
RichieV

Reputation: 5183

You can use series.str.extract

df['raddr'] = df['message'].str.extract(r'raddr=([\d\.]*)') # not tested

The pattern has only one capturing group with the value after the equal sign. It will capture any combination of digits and periods until it finds something else (a blank space, letter, symbol, or end of line).

Upvotes: 2

Anjaly Vijayan
Anjaly Vijayan

Reputation: 267

>>> import re
>>> s = '2020-09-23T22:38:34-04:00 mpp-xyz-010101-10-103.vvv0x.net patchpanel[1329]: RTP:a=end pp=10.10.10.10:9999 user=sip:[email protected];tag=2021005845 lport=12270 raddr=11.00.111.212 rport=3004 d=5 arx=0.000 tx=0.000 fo=0.000 txf=0.000 bi=11004 bo=453 pi=122 pl=0 ps=0 rtt="" font=0 ua=funny-SDK-4.11.2.34441.fdc6567fW jc=10 no-rtp=0 cid=2164444 relog=0 vxdi=0 vxdo=0 vxdr=0\n'
>>> re.search('raddr=.*?\s',s).group()
'raddr=11.00.111.212 '

Upvotes: 0

Related Questions