Reputation: 15
I apologize that this question is somewhat vague, I'm very new to Python...
I need to parse a tab delimited text file. It's a very large file and from it I am trying to identify and extract specific things. For example, if one line was:
[apple banana cherry date] I want to search and identify the term "apple" and then extract the term "date".
Then, I need to acces the list of extracted terms and use them (for comparisons with other lists, etc.)
I have read about Regular Expressions, but while that seems to be good for searching, I don't know how to use it to extract terms other than the searched key word..Also, I'm not sure how to access/manipulate the array of results after parsing..
Any help/direction/pointers/suggestions/examples would be amazing.
Thank you so much!
Upvotes: 1
Views: 954
Reputation: 473863
If a file is tab delimited, it's usually a sign for using csv module:
>>> import csv
>>> with open('eggs.csv', 'rb') as csvfile:
... reader = csv.reader(csvfile, dialect=csv.excel_tab)
... for row in reader:
... print row
It's hard to say more without any specific example.
Upvotes: 3
Reputation: 3393
http://docs.python.org/2/library/re.html
Here's a simple example:
import re
# This regular expression detects base-64 encoded images
regex = '(?P<src>data:image/png;base64, (?<image>[^"]*))'
# you can then either
# a)
matches = re.findall(regex, your_input_string)
for m in matches:
# address your matches with index notation
src = m[0]
data = m[1]
# b)
src = re.search(regex, your_input_string).group('src')
data = re.search(regex, your_input_string).group('data')
Upvotes: 1