pattmorter
pattmorter

Reputation: 991

Python Regular Expression of long complex string

So I am scraping data from a webpage and the received data usually is as followed:

233989 001 0 / 49 T R 4:15 PM - 5:30 PM 205 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947 266200 002 0 / 43 M W F 10:10 AM - 11:00 AM 110 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947

I am trying to split the data from the pattern ###### (6 numbers, i.e. 233989) to the phone number which represents the end of the current data line (i.e. (814) 865-8947) Because I know it'll always end with 4 numbers I came up with the expression:

(^[0-9]{1,6}$[^[0-9]{1,4}$]*[0-9]{1,4}$+)+

This does not seem to work though. Can anyone lend a helping hand?

Upvotes: 2

Views: 68

Answers (1)

brandonscript
brandonscript

Reputation: 72905

You could use this:

r'(\d{6}.*?\(\d{3}\) \d{3}-\d{4}) ?'

Then rebuild it on $1\n

Like so: http://regex101.com/r/lG4gG5

Python:

import re

s = '233989 001 0 / 49 T R 4:15 PM - 5:30 PM 205 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947 266200 002 0 / 43 M W F 10:10 AM - 11:00 AM 110 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947'
spl = re.split(r'(\d{6}.*?\(\d{3}\) \d{3}-\d{4}) ?', s)
for line in spl:
    print line

Upvotes: 1

Related Questions