Reputation: 23
I'm an experienced C programmer, but a complete python newbie. I'm learning python mostly for fun, and as a first exercise want to parse a text file, extracting the meaningful bits from the fluff, and ending up with a tab-delimited string of those bits in a different order.
I've had a blast plowing through tutorials and documentation and stackoverflow Q&As, merrily splitting strings and reading lines from files and etc. Now I think I'm at the point where I need a few road signs from experienced folks to avoid blind alleys.
Here's one chunk of the text I want to parse (you may recognize this as a McMaster order). The actual file will contain one or more chunks like this.
1 92351A603 Lag Screw for Wood, 18-8 Stainless Steel, 5/16" Diameter, 5" Long, packs of 5
Your Part Number: 7218-GYROID
22
packs today
5.85
per pack 128.70
Note that the information is split over several lines in the file. I'd like to end up with a tab-delimited string that looks like this:
22\tpacks\tLag Screw for Wood, 18-8 Stainless Steel, 5/16" Diameter, 5" Long, packs of 5\t\t92351A603\t5.85\t\t128.70\t7218-GYROID\n
So I need to extract some parts of the string while ignoring others, rearrange them a bit, and re-pack them into a string.
Here's the (very early) code I have at the moment, it reads the file a line at a time, splits each line with delimiters, and I end up with several lists of strings, including a bunch of empty ones where there were double tabs:
import sys
import string
def split(delimiters, string, maxsplit=0):
"""Split the given string with the given delimiters (an array of strings)
This function lifted from stackoverflow in a post by Kos"""
import re
regexPattern = '|'.join(map(re.escape, delimiters))
return re.split(regexPattern, string, maxsplit)
delimiters = "\t", "\n", "\r", "Your Part Number: "
with open(sys.argv[1], 'r') as f:
for line in f:
print(split( delimiters, line))
f.close()
Question 1 is basic: how can I remove the empty strings from my lists, then mash all the strings together into one list? In C I'd loop through all the lists, ignoring the empties and sticking the other strings in a new list. But I have a feeling python has a more elegant way to do this sort of thing.
Question 2 is more open ended: what's a robust strategy here? Should I read more than one line at a time in the first place? Make a dictionary, allowing easier re-ordering of the items later?
Sorry for the novel. Thanks for any pointers. And please, stylistic comments are more than welcome, style matters.
Upvotes: 2
Views: 573
Reputation: 21906
You can remove empty strings by:
new_list = filter(None, old_list)
Replace the first parameter with a lambda expression that is True for elements you want to keep. Passing None is equivalent to lambda x: x
.
You can mash strings together into one string using:
a_string = "".join(list_of_strings)
If you have several lists (of whatever) and you want to join them together into one list, then:
new_list = reduce(lambda x, y: x+y, old_list)
That will simply concatenate them, but you can use any non-empty string as the separator.
If you're new to Python, then functions like filter
and reduce
(EDIT: deprecated in Python 3) may seem a bit alien, but they save a lot of time coding, so it's worth getting to know them.
I think you're on the right track to solving your problem. I'd do this:
Personally, I'd make a class to handle the last two parts (they kind of belong together logically) but you could get by without it.
Upvotes: 0
Reputation: 10673
You don't need to close
file when using with
.
And if I were to implement this. I might use a big regex to extract parts from each chunk(with finditer
), and reassemble them for output.
Upvotes: 1