Reputation: 81
This is a line of a file and I want to take only the url after the word uri and the url after smallPictureUrl to use it later but i can not find a proper way
The asterisks represent text or numbers or both together and the are different in every line who looks like this so they can not be helpfull, the have not a pattern to take advantage of it
{"bigPictureUrl":"https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg",
"timelineCoverPhoto":"{\"focus\":{\"x\":0.5,\"y\":0.49137931034483},\"photo\":{\"__type__
\":{\"name\":\"Photo\"},\"image_lowres\":{\"uri\":\"https://fbcdn-*-*-*.*.*/*-*-*/*.jpg
\",\"width\":180,\"height\":135}}}",
"subscribeStatus":"IS_SUBSCRIBED","smallPictureUrl":"https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg",
in something more simple like:
{"displayName":"Jim Test","firstName":"*","lastName":"*"}
i managed to take the name for example Jim Test after displayName with using the re.search('(?<="displayName":")(\w+) (\w+)',line)
but for the other is very complicated if you can give me any direction or advice .
a line is exactly like this
{"bigPictureUrl":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-prn2/*.*.*.*/s200x200/*_*_*_*.jpg","timelineCoverPhoto":"{\"focus\":{\"x\":0.5,\"y\":0.40652557319224},\"photo\":{\"__type__\":{\"name\":\"Photo\"},\"image_lowres\":{\"uri\":\"https://fbcdn-photos-h-a.akamaihd.net/hphotos-ak-prn2/*_*_*_a.jpg\",\"width\":180,\"height\":120}}}","subscribeStatus":"IS_SUBSCRIBED","smallPictureUrl":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-prn2/*.*.*.*/s100x100/*_*_*_a.jpg","contactId":"**==","contactType":"USER","friendshipStatus":"ARE_FRIENDS","graphApiWriteId":"contact_*:*:*","hugePictureUrl":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-prn2/*.*.*.*/s720x720/*_*_*_*.jpg","profileFbid":"*","isMobilePushable":"NO","lookupKey":null,"name":{"displayName":"* *","firstName":"*","lastName":"*"},"nameSearchTokens":["*","*"],"phones":[],"phoneticName":{"displayName":null,"firstName":null,"lastName":null},"isMemorialized":false,"communicationRank":0.4183731,"canViewerSendGift":false,"canMessage":true}
Upvotes: 0
Views: 137
Reputation: 98871
#See: http://daringfireball.net/2010/07/improved_regex_for_matching_urls
import re, urllib
GRUBER_URLINTEXT_PAT = re.compile(ur'(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?\xab\xbb\u201c\u201d\u2018\u2019]))')
for line in urllib.urlopen("http://daringfireball.net/misc/2010/07/url-matching-regex-test-data.text"):
print [ mgroups[0] for mgroups in GRUBER_URLINTEXT_PAT.findall(line) ]
Upvotes: 2
Reputation: 3121
If you not okay with using json, how about this ?
>>> print mytext
{"bigPictureUrl":"https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg",
"timelineCoverPhoto":"{"focus":{"x":0.5,"y":0.49137931034483},"photo":{"__type__
":{"name":"Photo"},"image_lowres":{"uri":"https://fbcdn-*-*-*.*.*/*-*-*/*.jpg
","width":180,"height":135}}}",
"subscribeStatus":"IS_SUBSCRIBED","smallPictureUrl":"https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg",
>>> uri = re.findall(r'uri\"\:\"[\'"]?([^\'" >]+)', mytext) #gets the uri
>>> smallpicurl = re.findall(r'smallPictureUrl\"\:\"[\'"]?([^\'" >]+)', mytext) # gets the smallPictureUrl
>>> ''.join(uri).rstrip()
'https://fbcdn-*-*-*.*.*/*-*-*/*.jpg' # uri
>>> ''.join(smallpicurl).rstrip()
'https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg' # smallPictureUrl
Upvotes: 1
Reputation: 1794
The value associated with timelineCoverPhoto
seems to be stringified JSON, so you could do something admittedly ugly like this:
import json
s = {
"subscribeStatus": "IS_SUBSCRIBED",
"bigPictureUrl": "https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg",
"timelineCoverPhoto": "{\"focus\":{\"x\":0.5,\"y\":0.49137931034483},\"photo\":{\"__type__\":{\"name\":\"Photo\"},\"image_lowres\":{\"uri\":\"https://fbcdn-*-*-*.*.*/*-*-*/*.jpg \",\"width\":180,\"height\":135}}}",
"smallPictureUrl": "https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg"
}
j = json.loads(s.get('timelineCoverPhoto'))
print "uri:", j.get('photo').get('image_lowres').get('uri')
uri: https://fbcdn-*-*-*.*.*/*-*-*/*.jpg
Upvotes: 2