Reputation: 303
I am writing a wrapper for smartctl in python 2.7.3...
I am having a hell of a time trying to wrap my head around how to parse the output from the smartctl
program in Linux (Ubuntu x64 to be specific)
I am running smartctl -l selftest /dev/sdx
via subprocess and grabbing the output into a variable
This variable is broken up into a list, then I drop the useless header data and blank lines from the output.
Now, I am left with a list of strings, which is great!
The data is sort-of tabular, and I want to parse it into a dict() full of lists (I think this is the correct way to represent tabular data in Python from reading the docs)
Here's a sample of the data:
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 44796 -
# 2 Short offline Completed without error 00% 44796 -
# 3 Short offline Completed without error 00% 44796 -
# 4 Short offline Completed without error 00% 44796 -
# 5 Short offline Completed without error 00% 44796 -
# 6 Extended offline Completed without error 00% 44771 -
# 7 Short offline Completed without error 00% 44771 -
# 8 Short offline Completed without error 00% 44741 -
# 9 Short offline Completed without error 00% 1 -
#10 Short offline Self-test routine in progress 70% 44813 -
I can see some issues with trying to parse this, and am open to solutions, but i may also just be doing this all wrong ;-):
Self-test routine in progress
flows past the first character of the text Remaining
Num
column, the numbers after 9
are not separated from the #
character by a spaceI might be way off-base here, but this is my first time trying to parse something this eccentric.
Thank everyone who even bothers to read this wall of text in advance!!!
Here's my code so far, if anyone feels it necessary or finds it useful:
#testStatus.py
#This module provides an interface for retrieving
#test status and results for ongoing and completed
#drive tests
import subprocess
#this function takes a list of strings and removes
#strings which do not have pertinent information
def cleanOutput(data):
cleanedOutput = []
del data[0:3] #This deletes records 0-3 (lines 1-4) from the list
for item in data:
if item == '': #This removes blank items from remaining list
pass
else:
cleanedOutput.append(item)
return cleanedOutput
def resultsOutput(data):
headerLines = []
resultsLines = []
resultsTable = {}
for line in data:
if "START OF READ" in line or "log structure revision" in line:
headerLines.append(line)
else:
resultsLines.append(line)
nameLine = resultsLines[0].split()
print nameLine
def getStatus(sdxPath):
try:
output = subprocess.check_output(["smartctl", "-l", "selftest", sdxPath])
except subprocess.CalledProcessError:
print ("smartctl command failed...")
except Exception as e:
print (e)
splitOutput = output.split('\n')
cleanedOutput = cleanOutput(splitOutput)
resultsOutput(cleanedOutput)
#For Testing
getStatus("/dev/sdb")
Upvotes: 5
Views: 4940
Reputation: 1818
For what it's worth (this is an old question): smartctl
has a --json
flag which you can use and then parse the output like normal JSON since version 7.0
Upvotes: 7
Reputation: 865
The main parsing problem seems to be the first three columns; the remaining data is more straight forward. Assuming the output uses blanks between fields (instead of tab characters, which would be much easier to parse), I'd go for fixed length parsing, something like:
num = line[1:2]
desc = line[5:25]
status = line[25:54]
remain = line[54:58]
lifetime = line[60:68]
lba = line[77:99]
The header line would be handled differently. What structure you put the data into depends on what you want to do with it. A dictionary keyed by "num" might be appropriate if you mainly wanted to randomly access data by that "num" identifier. Otherwise a list might be better. Each entry (per line) could be a tuple, a list, a dictionary, a class instance, or maybe other things. If you want to access fields by name, then a dictionary or class instance per entry might be appropriate.
Upvotes: 1