Reputation: 526
I'm processing data from a readout from a storage device in this format:
id:name:UPS_serial_number:WWNN:status:IO_group_id:IO_group_name:config_node:UPS_unique_id:hardware:iscsi_name:iscsi_alias:panel_name:enclosure_id:canister_id:enclosure_serial_number:site_id:site_name
10:node_A::00A550:online:0:io_grp0:yes::SV1:iqn.1986-03.com:2145.test.nodeA::A:::::
15:node_B::00A548:online:0:io_grp0:no::SV1:iqn.1986-03.com.:2145.test.nodeB::B:::::
How can I read that data as a 2D array, like datarray['15']['status']
?
I tried this way:
# Create array
datarray = []
try:
# Loop trough list
for i, x in enumerate(lis):
# Split on the delimter
linesplit = x.split(":")
row = []
for lsi,lsx in enumerate(linesplit):
row.append([lsi,lsx])
datarray.append(row)
But that seems to slice the the data wrong:
[[[0, u'id'], [1, u'name'], [2, u'UPS_serial_number'], [3, u'WWNN'], [4, u'status'], [5, u'IO_group_id'], [6, u'IO_group_name'], [7, u'config_node'], [8, u'UPS_unique_id'], [9, u'hardware'], [10, u'iscsi_name'], [11, u'iscsi_alias'], [12, u'panel_name'], [13, u'enclosure_id'],
Upvotes: 2
Views: 716
Reputation: 26
What I can make out of the data is that it is colon(:) separated data and first line has header. If that is the case you can load it to pandas dataframe as you load a csv file with separator = ':'. And then convert that dataframe to numpy array.
import pandas as pd
import os
os.chdir('/Users/Downloads/')
df = pd.read_csv('train.txt',sep=':')
df
id name UPS_serial_number WWNN status IO_group_id IO_group_name config_node UPS_unique_id hardware iscsi_name iscsi_alias panel_name enclosure_id canister_id enclosure_serial_number site_id site_name
10 node_A NaN 00A550 online 0 io_grp0 yes NaN SV1 iqn.1986-03.com 2145.test.nodeA NaN A NaN NaN NaN NaN NaN
15 node_B NaN 00A548 online 0 io_grp0 no NaN SV1 iqn.1986-03.com. 2145.test.nodeB NaN B NaN NaN NaN NaN NaN
df.as_matrix()
array([['node_A', nan, '00A550', 'online', 0, 'io_grp0', 'yes', nan,
'SV1', 'iqn.1986-03.com', '2145.test.nodeA', nan, 'A', nan, nan,
nan, nan, nan],
['node_B', nan, '00A548', 'online', 0, 'io_grp0', 'no', nan,
'SV1', 'iqn.1986-03.com.', '2145.test.nodeB', nan, 'B', nan, nan,
nan, nan, nan]], dtype=object)
Upvotes: 1
Reputation: 82929
Use a csv.DictReader
to read the individual lines as dictionaries and then use a dictionary comprehention to create the "outer" dict mapping the ID
attribute to the inner dicts with that ID.
raw = """id:name:UPS_serial_number:WWNN:status:IO_group_id:IO_group_name:config_node:UPS_unique_id:hardware:iscsi_name:iscsi_alias:panel_name:enclosure_id:canister_id:enclosure_serial_number:site_id:site_name
10:node_A::00A550:online:0:io_grp0:yes::SV1:iqn.1986-03.com:2145.test.nodeA::A:::::
15:node_B::00A548:online:0:io_grp0:no::SV1:iqn.1986-03.com.:2145.test.nodeB::B:::::"""
reader = csv.DictReader(raw.splitlines(), delimiter=":")
result = {line["id"]: line for line in reader}
print(result["15"]["status"]) # 'online'
Note that this is not a 2D array but a dictionary of dictionaries (with dictionaries being associative arrays). As a simple 2D array, a query like result["15"]["status"]
would not work.
Upvotes: 1