Matadora
Matadora

Reputation: 17

I would like to extract certain part of a string from csv file

I have vast number of columns containing this kind of data:

DE-JP-202/2066/[email protected]/68
NL-LK-02206/2136/[email protected]/731
OM-PH-31303222/3671/[email protected]/524

I would like to extract string between '@' and '.' and between '.' and '/' into two separete colums .

Like :

txt 1      txt 2
qwier       cu
ozmmft      de
jtqy        ml

Tried:

x = dane.str.extract(r'@(?P<txt1>\d)\.(?P<txt2>[ab\d])/')

But doesn't work

Upvotes: 0

Views: 543

Answers (2)

The fourth bird
The fourth bird

Reputation: 163362

If you want to get 2 capturing groups, you could use 2 negated character classes.

In the first group match 1+ times any char except a dot [^.]+

In the second group match 1+ times any char except a forward slash [^/]+

@(?P<txt1>[^.]+)\.(?P<txt2>[^/]+)/

Regex demo

Upvotes: 3

coldsoup
coldsoup

Reputation: 31

If the formatting of your strings all have only 1 @ and 1 .. You can do the following:

s = 'DE-JP-202/2066/[email protected]/68'

column1 = s.split('@')[1].split('.')[0]

column2 = s.split('@')[1].split('.')[1].split('/')[0]

Upvotes: 0

Related Questions