Effectively querying DBF files with Python

Question

I have the need to read from a legacy VFP DBF database and gather all rows which have an etd within the current week.

I am using dbf however it seems that when querying the table, it begins the query at the very first record in the table. This causes performance issues when attempting to find data within the last week, as it has to iterate over every line in the database (60k+) every time it runs.

table = dbf.Table(r'\server\file.dbf')

table.open()

for row in table:

    if (self.monday < row.etd < self.friday) and ('LOC' not in row.route):
        self.datatable.Rows.Add(row.manifest, row.route, row.etd, row.eta, row.inst, row.subname)
    else:
        continue

I tried to "reverse" the table with for row in table[::-1]:

However, this takes the same amount of time as I believe it needs to load the database into memory prior to the [::-1]

What would be a more efficient way to query these DBF files?

Ethan Furman · Accepted Answer

As you know, dbf does not support index files. It does, however, have some methods reminiscent of VFP that could help:

# untested

table = ...

potental_records = []
with table:               # auto opens and closes
    table.bottom()        # goes to end of table
    while True:
        table.skip(-1)    # move to previous record
        row = table.current_record
        if self.monday > row.etd:
            # gone back beyond range
            break
        elif row.etd < self.friday:
            potential_records.append(row)

# at this point the table is closed and potential_records should have all
# records in the etd range.

The above will only work if the records are physically ordered by etd.

Effectively querying DBF files with Python

Answers (1)

Related Questions