Reputation: 23
I am trying to access the number between two underscores. For example in the below text,
https://http-google-ghh.vault.com__929091__2.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0
I need to get only the numbers 929091, 929092 etc.
I tried '_(.*)_'
but I get the underscores too. I just need the number
Upvotes: 0
Views: 59
Reputation: 18631
Use
re.findall(r'__([0-9]+)__', s)
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
__ '__'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
__ '__'
import re
s = r"""https://http-google-ghh.vault.com__929091__2.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0"""
print(re.findall(r'__([0-9]+)__', s))
Results: ['929091', '929092', '929090', '929092', '1205024', '929090', '929092', '1205024']
Upvotes: 1