John Monte
John Monte

Reputation: 27

Make text file with inconsistent structure into JSON object with inconsistent number of items

I have a file structure this way with hundreds of lines of data:

RHSA-2019:1797 CVE-2017-17485,CVE-2018-12022,CVE-2018-12023,CVE-2018-14718,CVE-2018-14719,CVE-2018-19360,CVE-2018-19361,CVE-2018-19362 cpe:/a:redhat:jboss_bpms:6.4

The only consistencies in the data is the space between the 3 fields i want to separate and the commas for the data that's in the middle. The number of items for the first data type labeled RHSA is always one, the data labeled CVE varies from 1 to 20 items as well as the data labeled CPE.

I have tried to split the strings up using split() but im sure this can be done in one step with python as the data set is inconsistent in number of items but not structure.

I split the data by space using

data = rh.split()
for temp in data:
    print(temp)

so now I have

RHSA-2019:1797 

CVE-2017-17485,CVE-2018-12022,CVE-2018-12023,CVE-2018-14718,CVE-201814719,CVE-2018-19360,CVE-2018-19361,CVE-2018-19362 


cpe:/a:redhat:jboss_bpms:6.4

where each data set is on a separate line so ideally i would like to loop every 3 lines and throw the data inso a json like below:

[{"RHSA":{ "RHSA-2019:1797},
 {"CVE" :{ "CVE-2017-17485",
           "CVE-2018-12022",
           "CVE-2018-12023",
           "CVE-2018-14718",
           "CVE-2018-14719",
           "CVE-2018-19360",
           "CVE-2018-19361",
           "CVE-2018-19362" },
 {"CPE" :{ "cpe:/a:redhat:jboss_bpms:6.4"}]

Upvotes: 0

Views: 78

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195543

The JSON you provided in example isn't valid JSON, but this script produces something similar:

line = 'RHSA-2019:1797 CVE-2017-17485,CVE-2018-12022,CVE-2018-12023,CVE-2018-14718,CVE-2018-14719,CVE-2018-19360,CVE-2018-19361,CVE-2018-19362 cpe:/a:redhat:jboss_bpms:6.4'

import re
from collections import defaultdict
import json

d = defaultdict(list)

for i in line.split():
    d[re.findall(r'^(\w+)', i)[0].upper()].extend(i.split(','))

print(json.dumps(d, indent=4))

Prints:

{
    "RHSA": [
        "RHSA-2019:1797"
    ],
    "CVE": [
        "CVE-2017-17485",
        "CVE-2018-12022",
        "CVE-2018-12023",
        "CVE-2018-14718",
        "CVE-2018-14719",
        "CVE-2018-19360",
        "CVE-2018-19361",
        "CVE-2018-19362"
    ],
    "CPE": [
        "cpe:/a:redhat:jboss_bpms:6.4"
    ]
}

Upvotes: 1

Related Questions