Reputation: 8554
I have several text files that contain specific information.I need to extract the required info from the files and put them in a MySQL table. The files contain several lines of info but I only need these three lines for instance:
Name: Gorge
Registration ID: 6657
Registration Time: 2012-09-10 14:31:13
I wrote the below code but the result of the code is not what I want. The code still do not contain the insert for SQL part.
import fnmatch
import os
import pprint
matches=[]
b=[]
for root, dirnames, filenames in os.walk('d:/Data'):
for filename in fnmatch.filter(filenames, 'Info_reg.txt'):
matches.append(os.path.join(root, filename))
all_keys = ['name','Registration ID','registration time']
for m in matches:
f=open(m,'r')
for line in f:
for n in all_keys:
if line.startswith(n):
a = line.split(':',1)
b.append(a)
and the code result is like below which I assume cannot easily convert to a table:
['registration time', ' 2012-10-08 17:28:47\n'],
['Registration ID', ' 9876'],
['Name', ' Malcom\n'],
['registration time', ' 2012-10-08 17:28:47\n'],
['Registration ID', ' 45'],
['Name', 'mazu\n'],
Does anyone have an idea how can I change my code to make a nice table out of this file?
Upvotes: 0
Views: 176
Reputation: 10923
This may inspire a solution:
data = '''\
Name: Gorge
Registration ID: 6657
Registration Time: 2012-09-10 14:31:13
Somethign else: foo
Spam: Bar
Name: mazu
Registration ID: 45
Registration Time: 2012-10-08 17:28:47
Somethign else: foo
Spam: Bar'''.split('\n')
records = []
titles = ['Name','Registration ID','Registration Time']
def record_is_complete(rec):
return (rec.get('Name')
and rec.get('Registration ID')
and rec.get('Registration Time'))
def make_tuple(rec):
result = [ rec[key] for key in titles ]
return tuple(result)
record = {}
for line in data:
key, value = line.split(':', 1)
if key in titles:
record[key] = value.strip()
if record_is_complete(record):
records.append(make_tuple(record))
record = {}
print records
Result:
[('Gorge', '6657', '2012-09-10 14:31:13'),
('mazu', '45', '2012-10-08 17:28:47')]
Upvotes: 1
Reputation: 1122252
You want to call .strip()
on the results, and store the whole thing in a dictionary instead of a list of lists.
We can also optimize the line search and record handling; I am assuming here that when we find a Name
entry a new record has started:
records = []
all_keys = {'Name', 'Registration ID', 'registration time'}
first_key = 'Name'
for m in matches:
with open(m, 'r') as f
record = dict.fromkeys(all_keys) # new record dictionary with `None` values
for line in f:
key, value = line.split(':', 1)
key, value = key.strip(), value.strip()
if key not in all_keys:
continue # not interested in this line
if key == first_key and any(v for v in record.itervalues()):
# new record, finalize the previous
records.append(record)
record = dict.fromkeys(all_keys)
record[key] = value
if any(v for v in record.itervalues()):
# there is something in the last record still, add that too
records.append(record)
Now you have a list of records of the form:
records = [
{'registration time', '2012-10-08 17:28:47', 'Registration ID': '9876', 'Name', 'Malcom'},
{'registration time', '2012-10-08 17:28:47', 'Registration ID': '45', 'Name', 'mazu'},
]
These can be inserted into a database with MySQLdb using .executemany()
in one go:
cursor = conn.cursor()
cursor.executemany('INSERT INTO sometable (id, name, time) VALUES (%(Registration ID)s, %(Name)s, %(registration time)s)',
records)
conn.commit()
This inserts all collected records straight into the database.
Upvotes: 3