Reputation: 191
Problem: A command generates a table that makes it very hard to script around.
Solution: Convert table into a Python dictionary for much more efficient use. These tables can have 1 - 20 differing virtual drives and attributes such as "Name" may not be set.
Example Table:
Virtual Drives = 4
VD LIST :
=======
----------------------------------------------------------
DG/VD TYPE State Access Consist Cache sCC Size Name
----------------------------------------------------------
0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1/1 RAID1 Optl RW No RWTD - 1.818 TB two
2/2 RAID1 Optl RW No RWTD - 1.818 TB three
3/3 RAID1 Optl RW No RWTD - 1.818 TB four
4/4 RAID10 Reblg RW No RWTD - 4.681 TB
----------------------------------------------------------
Example Dictionary:
{"DG/VD":"0/0", "TYPE":"RAID1", "State":"Optl", "Access":"RW", "Consist":"No", "Cache":"RWTD", "sCC":"-", "Size":"1.818 TB", "Name":"one"}
{"DG/VD":"4/4", "TYPE":"RAID10", "State":"Reblg", "Access":"RW", "Consist":"No", "Cache":"RWTD", "sCC":"-", "Size":"4.681 TB", "Name":""}
There would be a total of four dictionaries, one for each virtual drive. How best to solve this issue?
I have some ideas. First, search for the table header and split on spaces to define a list. Second, search for the virtual drive with "number/number" and split on spaces to define the second list. However, 'Size' will need to be special as it will need to ignore the space between number and "TB".
Next, zip the two lists together to generate a dictionary. Does anybody have a better idea for manipulating this text?
# Create a list of all the headers in the virtual disk table
get_table_header = " /c0/vall show | awk '/^DG\/VD/'"
table_header_values = console_info(utility + get_table_header).split()
['DG/VD', 'TYPE', 'State', 'Access', 'Consist', 'Cache', 'sCC', 'Size', 'Name']
# Create a list of all virtual drives
get_virtual_drives = " /c0/vall show | awk '/^[0-9]\/[0-9]/'"
virtual_drive_values = console_info(utility + get_virtual_drives).split()
["0/0", "RAID1", "Optl", "RW", "No", "RWTD", "-", "1.818", "TB", "0"]
Upvotes: 5
Views: 31895
Reputation: 3277
You can try something like this:
import re
lines = re.split("\n", data)
for line in lines[8:-1]:
fields = re.split(" +",line)
print(fields)
where data
contain your table.
The login to follow is to split the table into single lines, then split every line into fields using as separator two or more spaces (note the re.split(" +", line)
). The trick is to start from the 8th line and end on the last-1 line.
Once you have the single line splitted into a list which contain the fields, to build a dictionary is simple
Upvotes: 0
Reputation:
One way to do it is by reading the table into a pandas DataFrame and using its to_dict method to convert it to a dict.
This requires copying the raw table to a file, removing the dashed lines and then adding an extra column in the header after Size perhaps called Units to accomodate the 'TB' data, or don't add an extra column and remove or replace the whitespace (perhaps with a dash) between each Size datum and 'TB'.
Then the file can be loaded into a pandas DataFrame (df) as a '\s+' separated csv file with the df.read_csv method, and its transpose converted to a dict with the df.T (same as df.transpose) and df.to_dict methods. Transposition of a 2D table just interchanges its columns and rows, that is the columns become rows and the rows become columns.
Starting with table.txt containing:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1/1 RAID1 Optl RW No RWTD - 1.818 TB two
2/2 RAID1 Optl RW No RWTD - 1.818 TB three
3/3 RAID1 Optl RW No RWTD - 1.818 TB four
The following code converts it to a dict named table_dict:
import pandas as pd
table_dict = pd.read_csv('table.txt',sep='\s+').to_dict(orient= 'index')
table_dict consists of 4 dicts with keys in range(4) and looks like:
import pprint
pprint.pprint(table_dict)
{0: {'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '0/0',
'Name': 'one',
'Size': 1.818,
'State': 'Optl',
'TYPE': 'RAID1',
'Units': 'TB',
'sCC': '-'},
1: {'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '1/1',
'Name': 'two',
'Size': 1.818,
'State': 'Optl',
'TYPE': 'RAID1',
'Units': 'TB',
'sCC': '-'},
2: {'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '2/2',
'Name': 'three',
'Size': 1.818,
'State': 'Optl',
'TYPE': 'RAID1',
'Units': 'TB',
'sCC': '-'},
3: {'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '3/3',
'Name': 'four',
'Size': 1.818,
'State': 'Optl',
'TYPE': 'RAID1',
'Units': 'TB',
'sCC': '-'}}
pandas DataFrame has additional methods for conversion to other formats including JSON, HTML, SQL and more.
Upvotes: 3
Reputation: 180391
from itertools import dropwhile, takewhile
with open("test.txt") as f:
dp = dropwhile(lambda x: not x.startswith("-"), f)
next(dp) # skip ----
names = next(dp).split() # get headers names
next(f) # skip -----
out = []
for line in takewhile(lambda x: not x.startswith("-"), f):
a, b = line.rsplit(None, 1)
out.append(dict(zip(names, a.split(None, 7) + [b])))]
Output:
from pprint import pprint as pp
pp(out)
[{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '0/0',
'Name': 'one',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '1/1',
'Name': 'two',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '2/2',
'Name': 'three',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '3/3',
'Name': 'four',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'}]
If you want to maintain order use an OrderedDict
out = [OrderedDict(zip(names, line.split()))
for line in takewhile(lambda x: not x.startswith("-"), f)]
For missing Name values as per your edit:
from itertools import dropwhile, takewhile
with open("test.txt") as f:
dp = dropwhile(lambda x: not x.startswith("-"), f)
next(dp) # skip ----
names = next(dp).split() # get headers names
next(f) # skip -----
out = []
for line in takewhile(lambda x: not x.startswith("-"), f):
a, b = line.rsplit(" ", 1)
out.append(dict(zip(names, a.rstrip().split(None, 7) + [b.rstrip()])))
Output:
[{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '0/0',
'Name': 'one',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '1/1',
'Name': 'two',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '2/2',
'Name': 'three',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '3/3',
'Name': 'four',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '4/4',
'Name': '',
'Size': '4.681 TB',
'State': 'Reblg',
'TYPE': 'RAID10',
'sCC': '-'}]
Which will also handle lines with multiple spaces between TB and the Name column value 1.818 TB one
Upvotes: 7
Reputation: 16776
There is a python library for that: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_fwf.html
import pandas as pd
data = pd.read_fwf('filename.txt')
But you have to preprocess the table to this format:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1/1 RAID1 Optl RW No RWTD - 1.818 TB two
2/2 RAID1 Optl RW No RWTD - 1.818 TB three
3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Upvotes: -1
Reputation: 123393
You could use the struct
module to parse the data in the lines of the table as shown below, which stores the data in an OrderedDict
to preserve field order in the dictionaries produced, but doing that is optional. The "Name" field does not have to be present.
from __future__ import print_function
from collections import OrderedDict
import json # for pretty-printing results
import struct
from textwrap import dedent
table = dedent("""
Virtual Drives = 4
VD LIST :
=======
----------------------------------------------------------
DG/VD TYPE State Access Consist Cache sCC Size Name
----------------------------------------------------------
0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1/1 RAID1 Optl RW No RWTD - 1.818 TB two
2/2 RAID1 Optl RW No RWTD - 1.818 TB three
3/3 RAID1 Optl RW No RWTD - 1.818 TB four
----------------------------------------------------------
""")
# negative widths represent ignored padding fields
fieldwidths = 3, -3, 5, -1, 4, -2, 2, -5, 3, -5, 4, -2, 3, -1, 8, -1, 5
fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
for fw in fieldwidths)
fieldstruct = struct.Struct(fmtstring)
parse = fieldstruct.unpack_from
itable = iter(table.splitlines())
for line in itable:
if line.startswith('-----'):
break
fieldnames = next(itable).split()
for line in itable:
if line.startswith('-----'):
break
for line in itable:
if line.startswith('-----'):
break
if len(line) < fieldstruct.size:
line += ' ' * (fieldstruct.size - len(line))
fields = tuple(field.strip() for field in parse(line))
rec = OrderedDict(zip(fieldnames, fields))
print(json.dumps(rec))
Output:
{"DG/VD": "0/0", "TYPE": "RAID1", "State": "Optl", "Access": "RW",
"Consist": "No", "Cache": "RWTD", "sCC": "-", "Size": "1.818 TB",
"Name": "one"}
{"DG/VD": "1/1", "TYPE": "RAID1", "State": "Optl", "Access": "RW",
"Consist": "No", "Cache": "RWTD", "sCC": "-", "Size": "1.818 TB",
"Name": "two"}
{"DG/VD": "2/2", "TYPE": "RAID1", "State": "Optl", "Access": "RW",
"Consist": "No", "Cache": "RWTD", "sCC": "-", "Size": "1.818 TB",
"Name": "three"}
{"DG/VD": "3/3", "TYPE": "RAID1", "State": "Optl", "Access": "RW",
"Consist": "No", "Cache": "RWTD", "sCC": "-", "Size": "1.818 TB",
"Name": "four"}
Upvotes: 2
Reputation: 955
import re
table = '''Virtual Drives = 4
VD LIST :
=======
----------------------------------------------------------
DG/VD TYPE State Access Consist Cache sCC Size Name
----------------------------------------------------------
0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1/1 RAID1 Optl RW No RWTD - 1.818 TB two
2/2 RAID1 Optl RW No RWTD - 1.818 TB three
3/3 RAID1 Optl RW No RWTD - 1.818 TB four
----------------------------------------------------------'''
table = table.split('\n')
result = []
header = None
divider_pattern = re.compile('^[-]{20,}$')
for i, row in enumerate(table):
row = row.strip()
if divider_pattern.match(row) and not header:
header = table[i + 1].split()
continue
if header and not divider_pattern.match(row):
row = row.split()
if row != header:
result.append(dict(zip(header, row)))
print result
Upvotes: -1