user2545406
user2545406

Reputation: 15

Parsing a tab delimited file

I apologize that this question is somewhat vague, I'm very new to Python...

I need to parse a tab delimited text file. It's a very large file and from it I am trying to identify and extract specific things. For example, if one line was:

[apple banana cherry date] I want to search and identify the term "apple" and then extract the term "date".

Then, I need to acces the list of extracted terms and use them (for comparisons with other lists, etc.)

I have read about Regular Expressions, but while that seems to be good for searching, I don't know how to use it to extract terms other than the searched key word..Also, I'm not sure how to access/manipulate the array of results after parsing..

Any help/direction/pointers/suggestions/examples would be amazing.

Thank you so much!

Upvotes: 1

Views: 954

Answers (2)

alecxe
alecxe

Reputation: 473863

If a file is tab delimited, it's usually a sign for using csv module:

>>> import csv
>>> with open('eggs.csv', 'rb') as csvfile:
...     reader = csv.reader(csvfile, dialect=csv.excel_tab)
...     for row in reader:
...         print row

It's hard to say more without any specific example.

Upvotes: 3

Lorcan O'Neill
Lorcan O'Neill

Reputation: 3393

http://docs.python.org/2/library/re.html

Here's a simple example:

import re
# This regular expression detects base-64 encoded images
regex = '(?P<src>data:image/png;base64, (?<image>[^"]*))'
# you can then either
# a)
matches = re.findall(regex, your_input_string)
for m in matches:
    # address your matches with index notation
    src = m[0]
    data = m[1]
# b)
src = re.search(regex, your_input_string).group('src')
data = re.search(regex, your_input_string).group('data')

Upvotes: 1

Related Questions