Mary
Mary

Reputation: 797

Selecting the records with corresponding string separately within JSON

Want to select BBBBB@4## , AAAAA@5## , AAAAA@6## separately

JSON

x = {'d': 'BBBBB@4##{"pp-0": "1000", "pp-1": "1001", "pp-2": "1002", "pp-3": "1003", "pp-4": "1004", "pp-5": "1005", "pp-6": "1006", "pp-7": "1007", "pp-8": "1008", "pp-9": "1009", "pp-10": "1010", "pp-11": "1011", "pp-12": "1012", "pp-13": "1013", "pp-14": "1014", "pp-17": "1015", "pp-27": "1016"}AAAAA@5##{"pp-0": "1000", "pp-1": "1001", "pp-2": "1002", "pp-3": "1003", "pp-4": "1004", "pp-5": "1005", "pp-6": "1006", "pp-7": "1007", "pp-8": "1008", "pp-9": "1009", "pp-10": "1010", "pp-11": "1011", "pp-12": "1012", "pp-13": "1013", "pp-14": "1014", "pp-17": "1015", "pp-27": "1016"}AAAAA@6##{"pp-0": "1000", "pp-1": "1001", "pp-2": "1002", "pp-3": "1003", "pp-4": "1004", "pp-5": "1005", "pp-6": "1006", "pp-7": "1007", "pp-8": "1008", "pp-9": "1009", "pp-10": "1010", "pp-11": "1011", "pp-12": "1012", "pp-13": "1013", "pp-14": "1014", "pp-17": "1015", "pp-27": "1016"}'}

Python Code

x['d']['AAAAA@5##']

Expected Selected Result

{"pp-0": "1000", "pp-1": "1001", "pp-2": "1002", "pp-3": "1003", "pp-4": "1004", "pp-5": "1005", "pp-6": "1006", "pp-7": "1007", "pp-8": "1008", "pp-9": "1009", "pp-10": "1010", "pp-11": "1011", "pp-12": "1012", "pp-13": "1013", "pp-14": "1014", "pp-17": "1015", "pp-27": "1016"}

Upvotes: 0

Views: 58

Answers (2)

azro
azro

Reputation: 54168

I'd suggest a simple regex to extract each AAAA@X##{...} block and use them to build a new dictionary

import json
import re

x = {'d': 'AAAAA@4##{"pp-0": "1000", "pp-1": "1001", "pp-2": "1002", "pp-3": "1003", "pp-4": "1004", '
          '"pp-5": "1005", "pp-6": "1006", "pp-7": "1007", "pp-8": "1008", "pp-9": "1009", "pp-10": "1010", '
          '"pp-11": "1011", "pp-12": "1012", "pp-13": "1013", "pp-14": "1014", "pp-17": "1015", "pp-27": "1016"}'
          'AAAAA@5##{"pp-0": "1000", "pp-1": "1001", "pp-2": "1002", "pp-3": "1003", "pp-4": "1004", '
          '"pp-5": "1005", "pp-6": "1006", "pp-7": "1007", "pp-8": "1008", "pp-9": "1009", "pp-10": "1010", '
          '"pp-11": "1011", "pp-12": "1012", "pp-13": "1013", "pp-14": "1014", "pp-17": "1015", "pp-27": "1016"}'
          'AAAAA@6##{"pp-0": "1000", "pp-1": "1001", "pp-2": "1002", "pp-3": "1003", "pp-4": "1004", '
          '"pp-5": "1005", "pp-6": "1006", "pp-7": "1007", "pp-8": "1008", "pp-9": "1009", "pp-10": "1010", '
          '"pp-11": "1011", "pp-12": "1012", "pp-13": "1013", "pp-14": "1014", "pp-17": "1015", "pp-27": "1016"}'}

result = {}
for key, val in re.findall(r"(AAAAA@\d+##)({.*?})", x['d']):
    result[key] = json.loads(val)

print(result.keys())  # dict_keys(['AAAAA@4##', 'AAAAA@5##', 'AAAAA@6##'])
print(result['AAAAA@4##'])  # {'pp-0': '1000', ...  'pp-27': '1016'}

If the identifier can be another form, change the regex, here's few examples

  • ([A-Z]{5}@\d+##) : 5 uppercase letters instead of 5 A
  • ([A-Za-z]{2,5}@\d+##) : 2 to 5 letters instead of 5 A

Upvotes: 1

ThoSil
ThoSil

Reputation: 134

The value under the key 'd' is a string and not even a valid json.

The right answer is to fix the data format before trying using it.

... but if you couldn't you'll have to try to convert it to a valid json like this (which is far from beeing bullet proof):

>>> x['d'] = json.loads('{"' + x['d'].replace('##{', '##": {').replace('}A','}, "') + "}")
>>> x['d']['AAAA@5##']
{'pp-0': '1000', 'pp-1': '1001', 'pp-2': '1002', 'pp-3': '1003', 'pp-4': '1004', 'pp-5': '1005', 'pp-6': '1006', 'pp-7': '1007', 'pp-8': '1008', 'pp-9': '1009', 'pp-10': '1010', 'pp-11': '1011', 'pp-12': '1012', 'pp-13': '1013', 'pp-14': '1014', 'pp-17': '1015', 'pp-27': '1016'}

Upvotes: 1

Related Questions