Reputation: 191

Python convert table to dictionary

Problem: A command generates a table that makes it very hard to script around.

Solution: Convert table into a Python dictionary for much more efficient use. These tables can have 1 - 20 differing virtual drives and attributes such as "Name" may not be set.

Example Table:

Virtual Drives = 4

VD LIST :
=======

----------------------------------------------------------
DG/VD TYPE   State Access Consist Cache sCC     Size Name
----------------------------------------------------------
0/0   RAID1  Optl  RW     No      RWTD  -   1.818 TB one
1/1   RAID1  Optl  RW     No      RWTD  -   1.818 TB two
2/2   RAID1  Optl  RW     No      RWTD  -   1.818 TB three
3/3   RAID1  Optl  RW     No      RWTD  -   1.818 TB four
4/4   RAID10 Reblg RW     No      RWTD  -   4.681 TB 
----------------------------------------------------------

Example Dictionary:

{"DG/VD":"0/0", "TYPE":"RAID1", "State":"Optl", "Access":"RW", "Consist":"No", "Cache":"RWTD", "sCC":"-", "Size":"1.818 TB", "Name":"one"}
{"DG/VD":"4/4", "TYPE":"RAID10", "State":"Reblg", "Access":"RW", "Consist":"No", "Cache":"RWTD", "sCC":"-", "Size":"4.681 TB", "Name":""}

There would be a total of four dictionaries, one for each virtual drive. How best to solve this issue?

I have some ideas. First, search for the table header and split on spaces to define a list. Second, search for the virtual drive with "number/number" and split on spaces to define the second list. However, 'Size' will need to be special as it will need to ignore the space between number and "TB".

Next, zip the two lists together to generate a dictionary. Does anybody have a better idea for manipulating this text?

# Create a list of all the headers in the virtual disk table
get_table_header = " /c0/vall show | awk '/^DG\/VD/'"
table_header_values = console_info(utility + get_table_header).split()

['DG/VD', 'TYPE', 'State', 'Access', 'Consist', 'Cache', 'sCC', 'Size', 'Name']

# Create a list of all virtual drives
get_virtual_drives = " /c0/vall show | awk '/^[0-9]\/[0-9]/'"
virtual_drive_values = console_info(utility + get_virtual_drives).split()

["0/0", "RAID1", "Optl", "RW", "No", "RWTD", "-", "1.818", "TB", "0"]

Upvotes: 5

Answers (6)

user1092803

Reputation: 3277

You can try something like this:

import re

lines = re.split("\n", data)

for line in lines[8:-1]:
    fields = re.split("  +",line)
    print(fields)

where data contain your table. The login to follow is to split the table into single lines, then split every line into fields using as separator two or more spaces (note the re.split(" +", line)). The trick is to start from the 8th line and end on the last-1 line.

Once you have the single line splitted into a list which contain the fields, to build a dictionary is simple

Upvotes: 0

user4322779

Reputation:

One way to do it is by reading the table into a pandas DataFrame and using its to_dict method to convert it to a dict.

This requires copying the raw table to a file, removing the dashed lines and then adding an extra column in the header after Size perhaps called Units to accomodate the 'TB' data, or don't add an extra column and remove or replace the whitespace (perhaps with a dash) between each Size datum and 'TB'.

Then the file can be loaded into a pandas DataFrame (df) as a '\s+' separated csv file with the df.read_csv method, and its transpose converted to a dict with the df.T (same as df.transpose) and df.to_dict methods. Transposition of a 2D table just interchanges its columns and rows, that is the columns become rows and the rows become columns.

Starting with table.txt containing:

DG/VD TYPE  State Access Consist Cache sCC     Size Units Name
0/0   RAID1 Optl  RW     No      RWTD  -   1.818 TB one
1/1   RAID1 Optl  RW     No      RWTD  -   1.818 TB two
2/2   RAID1 Optl  RW     No      RWTD  -   1.818 TB three
3/3   RAID1 Optl  RW     No      RWTD  -   1.818 TB four

The following code converts it to a dict named table_dict:

import pandas as pd
table_dict = pd.read_csv('table.txt',sep='\s+').to_dict(orient= 'index')

table_dict consists of 4 dicts with keys in range(4) and looks like:

import pprint
pprint.pprint(table_dict)

{0: {'Access': 'RW',
     'Cache': 'RWTD',
     'Consist': 'No',
     'DG/VD': '0/0',
     'Name': 'one',
     'Size': 1.818,
     'State': 'Optl',
     'TYPE': 'RAID1',
     'Units': 'TB',
     'sCC': '-'},
 1: {'Access': 'RW',
     'Cache': 'RWTD',
     'Consist': 'No',
     'DG/VD': '1/1',
     'Name': 'two',
     'Size': 1.818,
     'State': 'Optl',
     'TYPE': 'RAID1',
     'Units': 'TB',
     'sCC': '-'},
 2: {'Access': 'RW',
     'Cache': 'RWTD',
     'Consist': 'No',
     'DG/VD': '2/2',
     'Name': 'three',
     'Size': 1.818,
     'State': 'Optl',
     'TYPE': 'RAID1',
     'Units': 'TB',
     'sCC': '-'},
 3: {'Access': 'RW',
     'Cache': 'RWTD',
     'Consist': 'No',
     'DG/VD': '3/3',
     'Name': 'four',
     'Size': 1.818,
     'State': 'Optl',
     'TYPE': 'RAID1',
     'Units': 'TB',
     'sCC': '-'}}

pandas DataFrame has additional methods for conversion to other formats including JSON, HTML, SQL and more.

Upvotes: 3

Padraic Cunningham

Reputation: 180391

from itertools import dropwhile, takewhile
with open("test.txt") as f:
    dp = dropwhile(lambda x: not x.startswith("-"), f)
    next(dp)  # skip ----
    names = next(dp).split()  # get headers names
    next(f)  # skip -----
    out = []
    for line in takewhile(lambda x: not x.startswith("-"), f):
        a, b = line.rsplit(None, 1)
        out.append(dict(zip(names, a.split(None, 7) + [b])))]

Output:

from pprint import  pprint as pp

pp(out)
[{'Access': 'RW',
  'Cache': 'RWTD',
  'Consist': 'No',
  'DG/VD': '0/0',
  'Name': 'one',
  'Size': '1.818 TB',
  'State': 'Optl',
  'TYPE': 'RAID1',
  'sCC': '-'},
 {'Access': 'RW',
  'Cache': 'RWTD',
  'Consist': 'No',
  'DG/VD': '1/1',
  'Name': 'two',
  'Size': '1.818 TB',
  'State': 'Optl',
  'TYPE': 'RAID1',
  'sCC': '-'},
 {'Access': 'RW',
  'Cache': 'RWTD',
  'Consist': 'No',
  'DG/VD': '2/2',
  'Name': 'three',
  'Size': '1.818 TB',
  'State': 'Optl',
  'TYPE': 'RAID1',
  'sCC': '-'},
 {'Access': 'RW',
  'Cache': 'RWTD',
  'Consist': 'No',
  'DG/VD': '3/3',
  'Name': 'four',
  'Size': '1.818 TB',
  'State': 'Optl',
  'TYPE': 'RAID1',
  'sCC': '-'}]

If you want to maintain order use an OrderedDict

out = [OrderedDict(zip(names, line.split()))
           for line in takewhile(lambda x: not x.startswith("-"), f)]

For missing Name values as per your edit:

from itertools import dropwhile, takewhile

with open("test.txt") as f:
    dp = dropwhile(lambda x: not x.startswith("-"), f)
    next(dp)  # skip ----
    names = next(dp).split()  # get headers names
    next(f)  # skip -----
    out = []
    for line in takewhile(lambda x: not x.startswith("-"), f):
        a, b = line.rsplit(" ", 1)
        out.append(dict(zip(names,  a.rstrip().split(None, 7) + [b.rstrip()])))

Output:

[{'Access': 'RW',
  'Cache': 'RWTD',
  'Consist': 'No',
  'DG/VD': '0/0',
  'Name': 'one',
  'Size': '1.818 TB',
  'State': 'Optl',
  'TYPE': 'RAID1',
  'sCC': '-'},
 {'Access': 'RW',
  'Cache': 'RWTD',
  'Consist': 'No',
  'DG/VD': '1/1',
  'Name': 'two',
  'Size': '1.818 TB',
  'State': 'Optl',
  'TYPE': 'RAID1',
  'sCC': '-'},
 {'Access': 'RW',
  'Cache': 'RWTD',
  'Consist': 'No',
  'DG/VD': '2/2',
  'Name': 'three',
  'Size': '1.818 TB',
  'State': 'Optl',
  'TYPE': 'RAID1',
  'sCC': '-'},
 {'Access': 'RW',
  'Cache': 'RWTD',
  'Consist': 'No',
  'DG/VD': '3/3',
  'Name': 'four',
  'Size': '1.818 TB',
  'State': 'Optl',
  'TYPE': 'RAID1',
  'sCC': '-'},
 {'Access': 'RW',
  'Cache': 'RWTD',
  'Consist': 'No',
  'DG/VD': '4/4',
  'Name': '',
  'Size': '4.681 TB',
  'State': 'Reblg',
  'TYPE': 'RAID10',
  'sCC': '-'}]

Which will also handle lines with multiple spaces between TB and the Name column value 1.818 TB one

Upvotes: 7

Davoud Taghawi-Nejad

Reputation: 16776

There is a python library for that: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_fwf.html

 import pandas as pd

 data = pd.read_fwf('filename.txt')

But you have to preprocess the table to this format:

DG/VD TYPE  State Access Consist Cache sCC     Size Units Name
0/0   RAID1 Optl  RW     No      RWTD  -   1.818 TB one
1/1   RAID1 Optl  RW     No      RWTD  -   1.818 TB two
2/2   RAID1 Optl  RW     No      RWTD  -   1.818 TB three
3/3   RAID1 Optl  RW     No      RWTD  -   1.818 TB four

Upvotes: -1

martineau

Reputation: 123393

You could use the struct module to parse the data in the lines of the table as shown below, which stores the data in an OrderedDict to preserve field order in the dictionaries produced, but doing that is optional. The "Name" field does not have to be present.

from __future__ import print_function
from collections import OrderedDict
import json  # for pretty-printing results
import struct
from textwrap import dedent

table = dedent("""
    Virtual Drives = 4

    VD LIST :
    =======

    ----------------------------------------------------------
    DG/VD TYPE  State Access Consist Cache sCC     Size Name
    ----------------------------------------------------------
    0/0   RAID1 Optl  RW     No      RWTD  -   1.818 TB one
    1/1   RAID1 Optl  RW     No      RWTD  -   1.818 TB two
    2/2   RAID1 Optl  RW     No      RWTD  -   1.818 TB three
    3/3   RAID1 Optl  RW     No      RWTD  -   1.818 TB four
    ----------------------------------------------------------
""")

# negative widths represent ignored padding fields
fieldwidths = 3, -3, 5, -1, 4, -2, 2, -5, 3, -5, 4, -2, 3, -1, 8, -1, 5
fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
                        for fw in fieldwidths)
fieldstruct = struct.Struct(fmtstring)
parse = fieldstruct.unpack_from

itable = iter(table.splitlines())
for line in itable:
    if line.startswith('-----'):
        break

fieldnames = next(itable).split()

for line in itable:
    if line.startswith('-----'):
        break

for line in itable:
    if line.startswith('-----'):
        break
    if len(line) < fieldstruct.size:
        line += ' ' * (fieldstruct.size - len(line))
    fields = tuple(field.strip() for field in parse(line))
    rec = OrderedDict(zip(fieldnames, fields))
    print(json.dumps(rec))

Output:

{"DG/VD": "0/0", "TYPE": "RAID1", "State": "Optl", "Access": "RW", 
 "Consist": "No", "Cache": "RWTD", "sCC": "-", "Size": "1.818 TB", 
 "Name": "one"}
{"DG/VD": "1/1", "TYPE": "RAID1", "State": "Optl", "Access": "RW", 
 "Consist": "No", "Cache": "RWTD", "sCC": "-", "Size": "1.818 TB", 
 "Name": "two"}
{"DG/VD": "2/2", "TYPE": "RAID1", "State": "Optl", "Access": "RW", 
 "Consist": "No", "Cache": "RWTD", "sCC": "-", "Size": "1.818 TB", 
 "Name": "three"}
{"DG/VD": "3/3", "TYPE": "RAID1", "State": "Optl", "Access": "RW", 
 "Consist": "No", "Cache": "RWTD", "sCC": "-", "Size": "1.818 TB", 
 "Name": "four"}

Upvotes: 2

Cody Bouche

Reputation: 955

import re

table = '''Virtual Drives = 4

VD LIST :
=======

----------------------------------------------------------
DG/VD TYPE  State Access Consist Cache sCC     Size Name
----------------------------------------------------------
0/0   RAID1 Optl  RW     No      RWTD  -   1.818 TB one
1/1   RAID1 Optl  RW     No      RWTD  -   1.818 TB two
2/2   RAID1 Optl  RW     No      RWTD  -   1.818 TB three
3/3   RAID1 Optl  RW     No      RWTD  -   1.818 TB four
----------------------------------------------------------'''
table = table.split('\n')

result = []
header = None
divider_pattern = re.compile('^[-]{20,}$')
for i, row in enumerate(table):
    row = row.strip()
    if divider_pattern.match(row) and not header:
        header = table[i + 1].split()
        continue
    if header and not divider_pattern.match(row):
        row = row.split()
        if row != header:
            result.append(dict(zip(header, row)))

print result

Upvotes: -1

Python convert table to dictionary

Answers (6)

Related Questions