tycoonbob
tycoonbob

Reputation: 3

Python - line split with spaces?

I'm sure this is a basic question, but I have spent about an hour on it already and can't quite figure it out. I'm parsing smartctl output, and here is the a sample of the data I'm working with:

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.32-39-pve] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     TOSHIBA MD04ACA500
Serial Number:    Y9MYK6M4BS9K
LU WWN Device Id: 5 000039 5ebe01bc8
Firmware Version: FP2A
User Capacity:    5,000,981,078,016 bytes [5.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Jul  2 11:24:08 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

What I'm trying to achieve is pulling out the device model (some devices it's just one string, other devices, such as this one, it's two words), serial number, time, and a couple other fields. I assume it would be easiest to capture all data after the colon, but how to eliminate the variable amounts of spaces?

Here is the relevant code I currently came up with:

deviceModel = ""
serialNumber = ""
lines = infoMessage.split("\n")
for line in lines:
    parts = line.split()
    if str(parts):
        if parts[0] == "Device Model:     ":
            deviceModel = parts[1]
        elif parts[0] == "Serial Number:    ":
            serialNumber = parts[1]
vprint(3, "Device model: %s" %deviceModel)
vprint(3, "Serial number: %s" %serialNumber)

The error I keep getting is:

File "./tester.py", line 152, in parseOutput
if parts[0] == "Device Model:     ":
IndexError: list index out of range

I get what the error is saying (kinda), but I'm not sure what else the range could be, or if I'm even attempting this in the right way. Looking for guidance to get me going in the right direction. Any help is greatly appreciated.

Thanks!

Upvotes: 0

Views: 1820

Answers (7)

wwii
wwii

Reputation: 23783

When you split the blank line, parts is an empty list. You try to accommodate that by checking for an empty list, But you turn the empty list to a string which causes your conditional statement to be True.

>>> s = []
>>> bool(s)
False
>>> str(s)
'[]'
>>> bool(str(s))
True
>>> 

Change if str(parts): to if parts:.

Many would say that using a try/except block would be idiomatic for Python

for line in lines:
    parts = line.split()
    try:
        if parts[0] == "Device Model:     ":
            deviceModel = parts[1]
        elif parts[0] == "Serial Number:    ":
            serialNumber = parts[1]
    except IndexError:
        pass

Upvotes: 0

celticminstrel
celticminstrel

Reputation: 1677

I think it would be far easier to use regular expressions here.

import re

for line in lines:
    # Splits the string into at most two parts
    # at the first colon which is followed by one or more spaces
    parts = re.split(':\s+', line, 1)
    if parts:
        if parts[0] == "Device Model":
            deviceModel = parts[1]
        elif parts[0] == "Serial Number":
            serialNumber = parts[1]

Mind you, if you only care about the two fields, startswith might be better.

Upvotes: 0

InSilico
InSilico

Reputation: 223

The IndexError occurs when the split returns a list of length one or zero and you access the second element. This happens when it isn't finding anything to split (empty line).

No need for regular expressions:

deviceModel = ""
serialNumber = ""
lines = infoMessage.split("\n")

for line in lines:
    if line.startswith("Device Model:"):
        deviceModel = line.split(":")[1].strip()
    elif line.startswith("Serial Number:"):
        serialNumber = line.split(":")[1].strip()

print("Device model: %s" %deviceModel)
print("Serial number: %s" %serialNumber)

Upvotes: 2

smbullet
smbullet

Reputation: 313

The way I would debug this is by printing out parts at every iteration. Try that and show us what the list is when it fails.

Edit: Your problem is most likely what @jonrsharpe said. parts is probably an empty list when it gets to an empty line and str(parts) will just return '[]' which is True. Try to test that.

Upvotes: 0

ljk
ljk

Reputation: 628

Not sure what version you're running, but on 2.7, line.split() is splitting the line by word, so

>>> parts = line.split()
parts = ['Device', 'Model:', 'TOSHIBA', 'MD04ACA500']

You can also try line.startswith() to find the lines you want https://docs.python.org/2/library/stdtypes.html#str.startswith

Upvotes: 0

Aderstedt
Aderstedt

Reputation: 6518

Try using regular expressions:

import re

r = re.compile("^[^:]*:\s+(.*)$")
m = r.match("Device Model:     TOSHIBA MD04ACA500")
print m.group(1)   # Prints "TOSHIBA MD04ACA500"

Upvotes: 0

Sait
Sait

Reputation: 19855

I guess your problem is the empty line in the middle. Because,

>>> '\n'.split()
[]

You can do something like,

>>> f = open('a.txt')
>>> lines = f.readlines()
>>> deviceModel = [line for line in lines if 'Device Model' in line][0].split(':')[1].strip()
# 'TOSHIBA MD04ACA500'
>>> serialNumber = [line for line in lines if 'Serial Number' in line][0].split(':')[1].strip()
# 'Y9MYK6M4BS9K'

Upvotes: 0

Related Questions