Reputation: 1659
I have this text file
VENDOR
ITEM NUMBER WH ITEM DESCRIPTION PRODUCT NUMBER PRICE DISC % MAIN-WH ALT-WH BIN#
--------------- ----- -- ------------------------------ --------------- --------- ------ --------- --------- ------
0.00 EA 0.00 0 0
10.5PLC/TLED/26V/27K 14.5W 4PIN CFL REPL 2700K VERT 458406 20.00 EA 0.00 0 0 I68 I68
(00029 )
10.5PLC/TLED/26V/30K 14.5W 4PIN CFL REPL 3000K VERT 458414 20.00 EA 0.00 3 0 PAYOFF I68 I68
(00029 )
10.5PLC/TLED/26V/35K 14.5W 4PIN CFL REPL 3500K VERT 458422 20.00 EA 0.00 0 0 I68 I68
(00029 )
10.5PLC/TLED/26V/40K 14.5W 4PIN CFL REPL 4000K VERT 458430 20.00 EA 0.00 0 0 I68 I68
(00029 )
I want to read each line item and get the item Number, description, Vendor product number and price.
I tried using this python code
def readInventoryFile():
# dataFile = open("inventoryFiles/INV.txt","r")
with open('inventoryFiles/INV.txt') as dataFile:
for lineItem in dataFile:
itemProperties = lineItem.split(" ")
while("" in itemProperties) :
itemProperties.remove("")
print(itemProperties)
try:
itemNum = itemProperties[0]
itemDesc = itemProperties[1]
partNumb = itemProperties[2]
price = itemProperties[3]
itemSummry = {
"Name": itemDesc,
"Price": price,
"PN": partNumb,
}
print(lineItem, "\n ",itemProperties,"\n Summary ",itemSummry)
except Exception as e:
print(e)
The code partially works but it's hard to split the line by spaces or other factors because there are spaces that separated and within the content of each line. How could I get the desired product properties?
Upvotes: 0
Views: 125
Reputation: 123531
I think my answer to the question How to efficiently parse fixed width files? can be adapted to do what you want.
The main modification to the code in that answer was to make it also strip any leading and trailing spaces in each field. Below is Python 3.x code illustrating this:
from __future__ import print_function
import struct
HEADER_LINES = 5
# Indices 0 1 2 3 4 5 6 7
fieldwidths = (20, -5, 37, -10, 12, -1, 6, -1, 9, -1, 9, -1, 10, -1, 7)
# Convert fieldwidths into a format compatible with struct module.
fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
for fw in fieldwidths)
fieldstruct = struct.Struct(fmtstring)
#print('fmtstring: {!r}, recsize: {} chars\n'.format(fmtstring, fieldstruct.size))
unpack_from = fieldstruct.unpack_from # To optimize calls.
def parse(line):
""" Return unpacked fields in string line, stripped of any leading and
trailing whitespace.
"""
return list(s.decode().strip() for s in unpack_from(line.encode()))
def readInventoryFile(filename):
with open(filename) as invfile:
for _ in range(HEADER_LINES):
next(invfile) # Skip header lines.
for line in invfile:
if len(line) < fieldstruct.size: # Pad line if it's too short.
line = line + (' ' * (fieldstruct.size-len(line)))
fields = parse(line)
if fields[0]: # First field non-blank?
print(fields)
readInventoryFile('inventoryFiles_INV.txt')
Results:
['10.5PLC/TLED/26V/27K', '14.5W 4PIN CFL REPL 2700K VERT 458406', '20.00 EA', '0.00', '0', '0', 'I68', 'I68']
['10.5PLC/TLED/26V/30K', '14.5W 4PIN CFL REPL 3000K VERT 458414', '20.00 EA', '0.00', '3', '0', 'PAYOFF I68', 'I68']
['10.5PLC/TLED/26V/35K', '14.5W 4PIN CFL REPL 3500K VERT 458422', '20.00 EA', '0.00', '0', '0', 'I68', 'I68']
['10.5PLC/TLED/26V/40K', '14.5W 4PIN CFL REPL 4000K VERT 458430', '20.00 EA', '0.00', '0', '0', 'I68', 'I68']
['1000PAR64/FFR', '1000W PAR64 HALOGEN GX16D BASE 56217', '50.00 EA', '0.00', '0', '0', 'I10', '']
['1000PAR64/WFL/S', '1000W PAR64 HALOGEN GX16D BASE S4673', '0.00 EA', '0.00', '0', '0', '', 'I105']
['100A/99', '100W A19 EXTENDED SERVICE 229781', '2.62 EA', '0.00', '0', '0', 'W6-2 I70', 'I11']
['100A/CL', '100W A19 130V CLEAR 375279', '0.99 EA', '0.00', '0', '0', 'A2-2 I70', 'I11']
In a nutshell, this code leverages Python's struct
module ability to split or "unpack" a "buffer" full of data into fixed-with "fields" each containing a certain number of characters.
Although more commonly applied to binary data, it also works on character strings that have been converted into arrays of bytes (which isn't necessary in Python 2.x). Basically you give it a format string specifying the characteristics of each these fields (type and size), along with the data to parse (a line from the file in this case), which it then unpacks accordingly and returns the results as a list of values.
Upvotes: 1
Reputation: 1404
Since your file is formatted in an inconvenient way I can only see this trick as solution: You can look at the headers (feature names) and parse all the lines according to the indices at which they appear, like such
import numpy as np
with open('/Users/Copo1/Desktop/aaa.txt') as dataFile:
lines = dataFile.readlines()
headers = ['ITEM NUMBER','WH ITEM DESCRIPTION', 'PRODUCT NUMBER', 'PRICE']
starts = [lines[1].find(h) for h in headers]
starts.append(len(lines[0]))
headers.append(' ')
items = [[line[starts[i]:starts[i+1]] for line in lines] for i,h in zip(range(len(starts)-1), headers[:-1]) ]
this produces the following output for the items
list (pasting only the first element, corresponding to 'ITEM NUMBER', the other elements are correct as well you can check).
[[' ',
'ITEM NUMBER ',
'--------------- ----- ',
' ',
' \n',
'10.5PLC/TLED/26V/27K ',
' ',
' \n',
'10.5PLC/TLED/26V/30K ',
' ',
' \n',
'10.5PLC/TLED/26V/35K ',
' ',
' \n',
'10.5PLC/TLED/26V/40K ',
' ',
' \n',
'1000PAR64/FFR ',
' ',
' \n',
'1000PAR64/WFL/S ',
' ',
' \n',
'100A/99 ',
' ',
' \n',
'100A/CL ',
' ',
' '],
There might be some extra easy polishing left to do after this (like removing empty strings and '\n's) but I am sure you can figure out yourself.
Upvotes: 0