Reputation: 2055
To exctract a specific digit from a string i'm using:
df['URL'].str.extract(r'dir=sale.aspx\%3fvpid%\w{2}(\d+)\%*',expand=False)
Example of string:
'a'|'b'|'c'|'d|'0CCC63BF60D2&dir=sale.aspx%3fvpid%3d49398%26utm_source%xyz'|'e'
here i want to extract: 49398
i have to extract a second patern in the same code for this kind of string:
'a'|'b'|'c'|'d'|'6A5528CD54F4&dir=sale.aspx&vpid=66395&utm_source=abc'|'a'
here i want to extract: 66395
I need to use something that try with two different patern.
i'm using python 2,7
Upvotes: 2
Views: 41
Reputation: 150785
You can try this pattern:
pattern = r'dir=sale.aspx(?:\%3fvpid%\w{2}|\&vpid=)(\d+)\%*'
# test data
df = pd.DataFrame({"URL":[
"'a'|'b'|'c'|'d|'0CCC63BF60D2&dir=sale.aspx%3fvpid%3d49398%26utm_source%xyz'|'e'",
"'a'|'b'|'c'|'d'|'6A5528CD54F4&dir=sale.aspx&vpid=66395&utm_source=abc'|'a'"
]})
# regex
df.URL.str.extract(pattern)
Output:
0
0 49398
1 66395
Upvotes: 2