Reputation: 27
I have a file structure this way with hundreds of lines of data:
RHSA-2019:1797 CVE-2017-17485,CVE-2018-12022,CVE-2018-12023,CVE-2018-14718,CVE-2018-14719,CVE-2018-19360,CVE-2018-19361,CVE-2018-19362 cpe:/a:redhat:jboss_bpms:6.4
The only consistencies in the data is the space between the 3 fields i want to separate and the commas for the data that's in the middle. The number of items for the first data type labeled RHSA is always one, the data labeled CVE varies from 1 to 20 items as well as the data labeled CPE.
I have tried to split the strings up using split() but im sure this can be done in one step with python as the data set is inconsistent in number of items but not structure.
I split the data by space using
data = rh.split()
for temp in data:
print(temp)
so now I have
RHSA-2019:1797
CVE-2017-17485,CVE-2018-12022,CVE-2018-12023,CVE-2018-14718,CVE-201814719,CVE-2018-19360,CVE-2018-19361,CVE-2018-19362
cpe:/a:redhat:jboss_bpms:6.4
where each data set is on a separate line so ideally i would like to loop every 3 lines and throw the data inso a json like below:
[{"RHSA":{ "RHSA-2019:1797},
{"CVE" :{ "CVE-2017-17485",
"CVE-2018-12022",
"CVE-2018-12023",
"CVE-2018-14718",
"CVE-2018-14719",
"CVE-2018-19360",
"CVE-2018-19361",
"CVE-2018-19362" },
{"CPE" :{ "cpe:/a:redhat:jboss_bpms:6.4"}]
Upvotes: 0
Views: 78
Reputation: 195543
The JSON you provided in example isn't valid JSON, but this script produces something similar:
line = 'RHSA-2019:1797 CVE-2017-17485,CVE-2018-12022,CVE-2018-12023,CVE-2018-14718,CVE-2018-14719,CVE-2018-19360,CVE-2018-19361,CVE-2018-19362 cpe:/a:redhat:jboss_bpms:6.4'
import re
from collections import defaultdict
import json
d = defaultdict(list)
for i in line.split():
d[re.findall(r'^(\w+)', i)[0].upper()].extend(i.split(','))
print(json.dumps(d, indent=4))
Prints:
{
"RHSA": [
"RHSA-2019:1797"
],
"CVE": [
"CVE-2017-17485",
"CVE-2018-12022",
"CVE-2018-12023",
"CVE-2018-14718",
"CVE-2018-14719",
"CVE-2018-19360",
"CVE-2018-19361",
"CVE-2018-19362"
],
"CPE": [
"cpe:/a:redhat:jboss_bpms:6.4"
]
}
Upvotes: 1