Reputation: 991
So I am scraping data from a webpage and the received data usually is as followed:
233989 001 0 / 49 T R 4:15 PM - 5:30 PM 205 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947 266200 002 0 / 43 M W F 10:10 AM - 11:00 AM 110 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947
I am trying to split the data from the pattern ###### (6 numbers, i.e. 233989
) to the phone number which represents the end of the current data line (i.e. (814) 865-8947
) Because I know it'll always end with 4 numbers I came up with the expression:
(^[0-9]{1,6}$[^[0-9]{1,4}$]*[0-9]{1,4}$+)+
This does not seem to work though. Can anyone lend a helping hand?
Upvotes: 2
Views: 68
Reputation: 72905
You could use this:
r'(\d{6}.*?\(\d{3}\) \d{3}-\d{4}) ?'
Then rebuild it on $1\n
Like so: http://regex101.com/r/lG4gG5
Python:
import re
s = '233989 001 0 / 49 T R 4:15 PM - 5:30 PM 205 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947 266200 002 0 / 43 M W F 10:10 AM - 11:00 AM 110 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947'
spl = re.split(r'(\d{6}.*?\(\d{3}\) \d{3}-\d{4}) ?', s)
for line in spl:
print line
Upvotes: 1