tsnn2d
tsnn2d

Reputation: 141

Managing JSON-like data from a text file

Python novice so please be nice. I have a .txt file containing JSON-like data on One line:

{"marketing_package_url": "http://www.capitalpacific.com/inquiry/TrailsEndMarketplaceExecSummary.pdf", "title": "TRAILS END MARKETPLACE", "location": "OREGON CITY, OR"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Yukon-Village-YukonOK.pdf", "title": "YUKON VILLAGE", "location": "YUKON, OK"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/SouthPointPlazaExecSummary-CONFI.pdf", "title": "SOUTH POINT PLAZA", "location": "EVERETT, WA"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/HomeDepotBellinghamExecutiveSummary.pdf", "title": "HOME DEPOT - BELLINGHAM", "location": "BELLINGHAM, WA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Muncie-Marketplace-MuncieIN.pdf", "title": "MUNCIE MARKETPLACE", "location": "MUNCIE, IN"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-NeighborhoodMarket-AugustaGA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "AUGUSTA, GA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-GainesvilleGA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "GAINESVILLE, GA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Texas-Strip-Center-Portfolio.pdf", "title": "TEXAS STRIP CENTER PORTFOLIO", "location": "VARIOUS LOCATIONS, TX"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/ArneyRetailCenterExecSummary.pdf", "title": "ARNEY RETAIL CENTER", "location": "WOODBURN, OR"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-NeighborhoodMarket-LaGrangeGA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "LAGRANGE, GA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-LynchburgVA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "LYNCHBURG, VA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-RoanokeVA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "ROANOKE, VA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-AshlandVA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "ASHLAND, VA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-OklahomaCityOK.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "OKLAHOMA CITY, OK"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/San-Angelo-Marketplace-SanAngeloTX.pdf", "title": "SAN ANGELO MARKETPLACE", "location": "SAN ANGELO, TX"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/KeizerVillageExecSummary.pdf", "title": "KEIZER VILLAGE", "location": "KEIZER, OR"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Bonanza-Shopping-Center-ClovisCA.pdf", "title": "BONANZA SHOPPING CENTER", "location": "CLOVIS, CA"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/WalgreensBellinghamExecSummary.pdf", "title": "WALGREENS", "location": "BELLINGHAM, WA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/The-OrchardCenter-TehachapiCA.pdf", "title": "THE ORCHARD CENTER", "location": "TEHACHAPI, CA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Cinetopia-VancouverWA.pdf", "title": "CINETOPIA", "location": "VANCOUVER, WA"}

What I am trying to do is get the marketing package URLS ONLY onto a list in a script so that it comes out something like this:

list[0] = http://www.capitalpacific.com/inquiry/TrailsEndMarketplaceExecSummary.pdf

list[1] = http://cp.capitalpacific.com/Properties/Yukon-Village-YukonOK.pdf

list[2] = ...

I have tried json.loads but gives the error that there is extra data or something along those lines. I believe this is due to the fact that it is a .txt file and not formatted exactly like JSON. Any help much appreciated thank you.

EDIT: The json objects are all on one line. This was my first attempt at it, trying to split up the individual objects and then rejoin them:

import json

result = []
with(open("properties.txt", "rU")) as f:
    j = f.next()
    jlist = len(jlist)
    print len(jlist)
    jlist = [jlist[0][1:] + "}"] + [ "{" + x + "}" for x in jlist[1:-1]] + ["{" + jlist[-1][:2]]
    for x in jlist:
        result.append(json.loads(x))

for x in result:
    print(x['title'])

Upvotes: 2

Views: 82

Answers (2)

Andrew Magee
Andrew Magee

Reputation: 6684

Here is a function that take a string containing any number of JSON objects run into each other and will parse each one and yield the results one by one:

import json
def get_json_objects(s):
    d = json.JSONDecoder()
    idx = 0
    while idx < len(s):
        j, idx = d.raw_decode(s, idx=idx)
        yield j

Example:

>>> list(get_json_objects("[1,2][3,4]{}"))
[[1, 2], [3, 4], {}]

So you could use it like this:

urls = [j["marketing_package_url"] for j in get_json_objects(open("data.txt").read())]

Upvotes: 1

vks
vks

Reputation: 67968

https?:\/\/[^"]+

If json is not working try with re.findall.See demo.

https://regex101.com/r/iS6jF6/7

import re
p = re.compile(r'https?:\/\/[^"]+', re.IGNORECASE | re.MULTILINE)
test_str = "{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/TrailsEndMarketplaceExecSummary.pdf\", \"title\": \"TRAILS END MARKETPLACE\", \"location\": \"OREGON CITY, OR\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Yukon-Village-YukonOK.pdf\", \"title\": \"YUKON VILLAGE\", \"location\": \"YUKON, OK\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/SouthPointPlazaExecSummary-CONFI.pdf\", \"title\": \"SOUTH POINT PLAZA\", \"location\": \"EVERETT, WA\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/HomeDepotBellinghamExecutiveSummary.pdf\", \"title\": \"HOME DEPOT - BELLINGHAM\", \"location\": \"BELLINGHAM, WA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Muncie-Marketplace-MuncieIN.pdf\", \"title\": \"MUNCIE MARKETPLACE\", \"location\": \"MUNCIE, IN\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-NeighborhoodMarket-AugustaGA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"AUGUSTA, GA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-GainesvilleGA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"GAINESVILLE, GA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Texas-Strip-Center-Portfolio.pdf\", \"title\": \"TEXAS STRIP CENTER PORTFOLIO\", \"location\": \"VARIOUS LOCATIONS, TX\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/ArneyRetailCenterExecSummary.pdf\", \"title\": \"ARNEY RETAIL CENTER\", \"location\": \"WOODBURN, OR\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-NeighborhoodMarket-LaGrangeGA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"LAGRANGE, GA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-LynchburgVA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"LYNCHBURG, VA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-RoanokeVA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"ROANOKE, VA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-AshlandVA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"ASHLAND, VA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-OklahomaCityOK.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"OKLAHOMA CITY, OK\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/San-Angelo-Marketplace-SanAngeloTX.pdf\", \"title\": \"SAN ANGELO MARKETPLACE\", \"location\": \"SAN ANGELO, TX\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/KeizerVillageExecSummary.pdf\", \"title\": \"KEIZER VILLAGE\", \"location\": \"KEIZER, OR\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Bonanza-Shopping-Center-ClovisCA.pdf\", \"title\": \"BONANZA SHOPPING CENTER\", \"location\": \"CLOVIS, CA\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/WalgreensBellinghamExecSummary.pdf\", \"title\": \"WALGREENS\", \"location\": \"BELLINGHAM, WA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/The-OrchardCenter-TehachapiCA.pdf\", \"title\": \"THE ORCHARD CENTER\", \"location\": \"TEHACHAPI, CA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Cinetopia-VancouverWA.pdf\", \"title\": \"CINETOPIA\", \"location\": \"VANCOUVER, WA\"}"

re.findall(p, test_str)

Upvotes: 0

Related Questions