Reputation: 141
Python novice so please be nice. I have a .txt file containing JSON-like data on One line:
{"marketing_package_url": "http://www.capitalpacific.com/inquiry/TrailsEndMarketplaceExecSummary.pdf", "title": "TRAILS END MARKETPLACE", "location": "OREGON CITY, OR"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Yukon-Village-YukonOK.pdf", "title": "YUKON VILLAGE", "location": "YUKON, OK"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/SouthPointPlazaExecSummary-CONFI.pdf", "title": "SOUTH POINT PLAZA", "location": "EVERETT, WA"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/HomeDepotBellinghamExecutiveSummary.pdf", "title": "HOME DEPOT - BELLINGHAM", "location": "BELLINGHAM, WA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Muncie-Marketplace-MuncieIN.pdf", "title": "MUNCIE MARKETPLACE", "location": "MUNCIE, IN"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-NeighborhoodMarket-AugustaGA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "AUGUSTA, GA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-GainesvilleGA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "GAINESVILLE, GA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Texas-Strip-Center-Portfolio.pdf", "title": "TEXAS STRIP CENTER PORTFOLIO", "location": "VARIOUS LOCATIONS, TX"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/ArneyRetailCenterExecSummary.pdf", "title": "ARNEY RETAIL CENTER", "location": "WOODBURN, OR"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-NeighborhoodMarket-LaGrangeGA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "LAGRANGE, GA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-LynchburgVA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "LYNCHBURG, VA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-RoanokeVA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "ROANOKE, VA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-AshlandVA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "ASHLAND, VA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-OklahomaCityOK.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "OKLAHOMA CITY, OK"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/San-Angelo-Marketplace-SanAngeloTX.pdf", "title": "SAN ANGELO MARKETPLACE", "location": "SAN ANGELO, TX"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/KeizerVillageExecSummary.pdf", "title": "KEIZER VILLAGE", "location": "KEIZER, OR"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Bonanza-Shopping-Center-ClovisCA.pdf", "title": "BONANZA SHOPPING CENTER", "location": "CLOVIS, CA"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/WalgreensBellinghamExecSummary.pdf", "title": "WALGREENS", "location": "BELLINGHAM, WA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/The-OrchardCenter-TehachapiCA.pdf", "title": "THE ORCHARD CENTER", "location": "TEHACHAPI, CA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Cinetopia-VancouverWA.pdf", "title": "CINETOPIA", "location": "VANCOUVER, WA"}
What I am trying to do is get the marketing package URLS ONLY onto a list in a script so that it comes out something like this:
list[0] = http://www.capitalpacific.com/inquiry/TrailsEndMarketplaceExecSummary.pdf
list[1] = http://cp.capitalpacific.com/Properties/Yukon-Village-YukonOK.pdf
list[2] = ...
I have tried json.loads but gives the error that there is extra data or something along those lines. I believe this is due to the fact that it is a .txt file and not formatted exactly like JSON. Any help much appreciated thank you.
EDIT: The json objects are all on one line. This was my first attempt at it, trying to split up the individual objects and then rejoin them:
import json
result = []
with(open("properties.txt", "rU")) as f:
j = f.next()
jlist = len(jlist)
print len(jlist)
jlist = [jlist[0][1:] + "}"] + [ "{" + x + "}" for x in jlist[1:-1]] + ["{" + jlist[-1][:2]]
for x in jlist:
result.append(json.loads(x))
for x in result:
print(x['title'])
Upvotes: 2
Views: 82
Reputation: 6684
Here is a function that take a string containing any number of JSON objects run into each other and will parse each one and yield the results one by one:
import json
def get_json_objects(s):
d = json.JSONDecoder()
idx = 0
while idx < len(s):
j, idx = d.raw_decode(s, idx=idx)
yield j
Example:
>>> list(get_json_objects("[1,2][3,4]{}"))
[[1, 2], [3, 4], {}]
So you could use it like this:
urls = [j["marketing_package_url"] for j in get_json_objects(open("data.txt").read())]
Upvotes: 1
Reputation: 67968
https?:\/\/[^"]+
If json
is not working try with re.findall
.See demo.
https://regex101.com/r/iS6jF6/7
import re
p = re.compile(r'https?:\/\/[^"]+', re.IGNORECASE | re.MULTILINE)
test_str = "{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/TrailsEndMarketplaceExecSummary.pdf\", \"title\": \"TRAILS END MARKETPLACE\", \"location\": \"OREGON CITY, OR\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Yukon-Village-YukonOK.pdf\", \"title\": \"YUKON VILLAGE\", \"location\": \"YUKON, OK\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/SouthPointPlazaExecSummary-CONFI.pdf\", \"title\": \"SOUTH POINT PLAZA\", \"location\": \"EVERETT, WA\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/HomeDepotBellinghamExecutiveSummary.pdf\", \"title\": \"HOME DEPOT - BELLINGHAM\", \"location\": \"BELLINGHAM, WA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Muncie-Marketplace-MuncieIN.pdf\", \"title\": \"MUNCIE MARKETPLACE\", \"location\": \"MUNCIE, IN\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-NeighborhoodMarket-AugustaGA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"AUGUSTA, GA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-GainesvilleGA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"GAINESVILLE, GA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Texas-Strip-Center-Portfolio.pdf\", \"title\": \"TEXAS STRIP CENTER PORTFOLIO\", \"location\": \"VARIOUS LOCATIONS, TX\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/ArneyRetailCenterExecSummary.pdf\", \"title\": \"ARNEY RETAIL CENTER\", \"location\": \"WOODBURN, OR\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-NeighborhoodMarket-LaGrangeGA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"LAGRANGE, GA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-LynchburgVA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"LYNCHBURG, VA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-RoanokeVA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"ROANOKE, VA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-AshlandVA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"ASHLAND, VA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-OklahomaCityOK.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"OKLAHOMA CITY, OK\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/San-Angelo-Marketplace-SanAngeloTX.pdf\", \"title\": \"SAN ANGELO MARKETPLACE\", \"location\": \"SAN ANGELO, TX\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/KeizerVillageExecSummary.pdf\", \"title\": \"KEIZER VILLAGE\", \"location\": \"KEIZER, OR\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Bonanza-Shopping-Center-ClovisCA.pdf\", \"title\": \"BONANZA SHOPPING CENTER\", \"location\": \"CLOVIS, CA\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/WalgreensBellinghamExecSummary.pdf\", \"title\": \"WALGREENS\", \"location\": \"BELLINGHAM, WA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/The-OrchardCenter-TehachapiCA.pdf\", \"title\": \"THE ORCHARD CENTER\", \"location\": \"TEHACHAPI, CA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Cinetopia-VancouverWA.pdf\", \"title\": \"CINETOPIA\", \"location\": \"VANCOUVER, WA\"}"
re.findall(p, test_str)
Upvotes: 0