Omar14
Omar14

Reputation: 2055

Pandas: trying to extract two different paterns

To exctract a specific digit from a string i'm using:

df['URL'].str.extract(r'dir=sale.aspx\%3fvpid%\w{2}(\d+)\%*',expand=False)

Example of string:

'a'|'b'|'c'|'d|'0CCC63BF60D2&dir=sale.aspx%3fvpid%3d49398%26utm_source%xyz'|'e'

here i want to extract: 49398

i have to extract a second patern in the same code for this kind of string:

'a'|'b'|'c'|'d'|'6A5528CD54F4&dir=sale.aspx&vpid=66395&utm_source=abc'|'a'

here i want to extract: 66395

I need to use something that try with two different patern.

i'm using python 2,7

Upvotes: 2

Views: 41

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150785

You can try this pattern:

pattern = r'dir=sale.aspx(?:\%3fvpid%\w{2}|\&vpid=)(\d+)\%*'

# test data
df = pd.DataFrame({"URL":[
    "'a'|'b'|'c'|'d|'0CCC63BF60D2&dir=sale.aspx%3fvpid%3d49398%26utm_source%xyz'|'e'",
    "'a'|'b'|'c'|'d'|'6A5528CD54F4&dir=sale.aspx&vpid=66395&utm_source=abc'|'a'"
]})

# regex
df.URL.str.extract(pattern)

Output:

       0
0  49398
1  66395

Upvotes: 2

Related Questions