Reputation: 1343
I have a situation, in which I'm parsing a file and collecting stats. I want to store those stats in nested dict which has a final value as a list. And as I process the file I want to expand the list.
for instance my dict structure is something like this
data_dict
{ "aa1" :
{ 'aa' : []}
{ 'bb' : [] }
"aa2" :
{ 'ab' : []}
{ 'ba' : [] }
}
Now as I parse the file I want to append the value to the last list for instance, in first occurrence of data my dict should look like this.
data_dict
{ "aa1" :
{ 'aa' : ['a0']}
{ 'bb' : ['a1'] }
"aa2" :
{ 'ab' : ['b0']}
{ 'ba' : ['b1'] }
}
and in second something like this
data_dict
{ "aa1" :
{ 'aa' : ['a0', 'a01']}
{ 'bb' : ['a1', 'a11'] }
"aa2" :
{ 'ab' : ['b0', 'b01']}
{ 'ba' : ['b1', 'b11'] }
}
Also I'm not initializing dict keys to anything and creating keys at the first occurrence of the match. Can anyone suggest how do I achieve this?
Note I'm using autovivification for initializing my data_dict, which at first doesn't contain anything.
This is sample data I'm trying to parse
DATETIME TYPE TAG COUNT MEAN 1% 10% 20% 30% 40% 50% 60% 70% 80% 90% 99%
20151109044056 LS_I aa8 57 80,493,122 8,931,000 8,937,000 8,944,000 8,974,000 9,073,000 21,262,000 28,419,000 35,794,000 148,920,000 316,408,000 447,902,000
20151109044056 LS_I aa0 6,893 9,008,024 8,862,000 8,913,000 8,941,000 8,964,000 8,984,000 9,006,000 9,028,000 9,049,000 9,071,000 9,102,000 9,170,000
20151109044056 LS_I aa1 6,062 9,018,094 8,867,000 8,913,000 8,938,000 8,961,000 8,983,000 9,003,000 9,025,000 9,048,000 9,071,000 9,103,000 9,175,000
20151109044056 LS_I aa2 2,776 9,030,621 8,929,000 8,967,000 8,987,000 8,999,000 9,012,000 9,024,000 9,037,000 9,050,000 9,065,000 9,087,000 9,161,000
20151109044056 LS_I aa3 1,074 9,028,744 8,925,000 8,970,000 8,988,000 9,002,000 9,016,000 9,026,000 9,039,000 9,051,000 9,067,000 9,089,000 9,138,000
20151109044056 LS_I aa4 6,060 9,003,651 8,874,000 8,935,000 8,958,000 8,976,000 8,991,000 9,005,000 9,019,000 9,033,000 9,049,000 9,071,000 9,121,000
20151109044056 LS_I aa5 5,453 9,003,993 8,874,000 8,936,000 8,959,000 8,976,000 8,991,000 9,004,000 9,018,000 9,032,000 9,048,000 9,071,000 9,126,000
20151109044056 LS_I aa6 16,384 328 111 165 190 208 227 253 301 362 434 551 997
20151109044056 LS_I aa7 16,384 316 58 65 70 76 87 137 308 395 512 702 1,562
so my dict has first key as Tag column, second key as one of the %column and then the value of this key is all the instances of that value in complete file.
This is my processing code, which is not working.
while re.match("\d{14}\s.*", curr_line):
lat_data = curr_line.split()
tag = lat_data[header.index("TAG")]
for item in range(len(header)):
col = header[item]
if '%' in col or\
"COUNT" in col or\
"MEAN" in col:
self.data_dict[tag][col].append(lat_data[item])
curr_line = lat_file.next()
Upvotes: 0
Views: 181
Reputation: 155353
First off: has_key
has been deprecated for ages (gone in Py3); you can use direct in
checks. Secondly, what you were trying to do with has_key
is nonsensical [tag][col]
is not legal syntax without something to index (without indexing/looking up something, it looks like two back to back single element list
literals, which isn't legal syntax). The fix for the test is to test for each component individually (after which you can append, since you know the value exists):
if tag in self.data_dict and col in self.data_dict[tag]:
self.data_dict[tag][col].append(whatever_you_want_to_append)
Side-note: You almost never want for i in range(len(something)):
; that's a symptom of coming from a C-style for
loop background. You're not actually using the index for anything besides getting the value, so replace:
for item in range(len(header)):
col = header[item]
with:
for col in header:
Runs faster, more idiomatically, etc. If you need the index too for some reason, that's what enumerate
is for:
for i, col in enumerate(header):
UPDATE: You updated the question with more info, so it looks like you need to iterate lat_data
in parallel. In that case, do:
for col, lat in zip(header, lat_data):
...
if tag in self.data_dict and col in self.data_dict[tag]:
self.data_dict[tag][col].append(lat)
Upvotes: 2