user1985351
user1985351

Reputation: 4689

Extracting data from a text file with Python

So I have a large text file. It contains a bunch of information in the following format:

|NAME|NUMBER(1)|AST|TYPE(0)|TYPE|NUMBER(2)||NUMBER(3)|NUMBER(4)|DESCRIPTION|

Sorry for the vagueness. All the information is formatted like the above and between each descriptor is the separator '|'. I want to be able to search the file for the 'NAME' and the print each descriptor in it's own tag such as this example:

Name
Number(1):
AST:
TYPE(0):
etc....

In case I'm still confusing, I want to be able to search the name and then print out the information that follows each being separated by a '|'.

Can anyone help?

EDIT Here is an example of a part of the text file:

|Trevor Jones|70|AST|White|Earth|3||500|1500|Old Man Living in a retirement home|

This is the code I have so far:

 with open('LARGE.TXT') as fd:
    name='Trevor Jones'
    input=[x.split('|') for x in fd.readlines()]
    to_search={x[0]:x for x in input}
    print('\n'.join(to_search[name]))

Upvotes: 4

Views: 3367

Answers (2)

jhoyla
jhoyla

Reputation: 1251

Something like

#Opens the file in a 'safe' manner
with open('large_text_file') as fd:
    #This reads in the file and splits it into tokens, 
    #the strip removes the extra pipes  
    input = [x.strip('|').split('|') for x in fd.readlines()]
    #This makes it into a searchable dictionary
    to_search = {x[0]:x for x in input}

and then search with

to_search[NAME]

Depending on the format you want the answers in use

print ' '.join(to_search[NAME])

or

print '\n'.join(to_search[NAME])

A word of warning, this solution assumes that the names are unique, if they aren't a more complex solution may be required.

Upvotes: 2

pydsigner
pydsigner

Reputation: 2885

First you need to break the file up somehow. I think that a dictionary is the best option here. Then you can get what you need.

d = {}
# Where `fl` is our file object
for L in fl:
    # Skip the first pipe
    detached = L[1:].split('|')
    # May wish to process here
    d[detached[0]] = detached[1:]
# Can do whatever with this information now
print d.get('string_to_search')

Upvotes: 2

Related Questions