Bdfy
Bdfy

Reputation: 24621

How to find substring in file?

How to find string in binary file using only read(1) ? For example I want to found position of string 'abst' in file ( without load to memory ) ? It's work but very primitive:

#!/usr/bin/python2
f = open("/tmp/rr", "rb")
f.seek(0)

cont = 1
while(cont):
    a1 = f.read(1)
    if a1 == 'a':
        a2 = f.read(1)
        if a2 == 'b':
            a3 = f.read(1)
            if a3 == 's':
                a4 = f.read(1)
                if a4 == 't':
                    found = True
                    cont = 0

Upvotes: 1

Views: 5073

Answers (4)

Rumple Stiltskin
Rumple Stiltskin

Reputation: 10395

Will this work for you?

#!/usr/bin/python

string = "abst"
f = open("/tmp/rr", "rb")
f.seek(0)

cont = 1
idx = 0
while True:
    c = f.read(1)
    if c == '':
        break
    if c == string[idx]:
        idx += 1
    elif c == string[0]:
        idx = 1
    else:
        idx = 0
    if idx == len(string):
        print "Found"
        break

Upvotes: 2

jan zegan
jan zegan

Reputation: 1657

If your file is mostly filled with 'a's, or whatever character corresponds to the first character in the string you're searching for, this algo will suck big time, otherwise works pretty well.

check = 'abst'
col=1
row=1
location = (-1, -1)

with open("/tmp/rr", 'rb') as p:
    ch = p.read(1)
    while(ch != ""):
        if ch == check[0]:
            st = p.read(len(check)-1)
            if ch+st == check:
                location = (row, col)
                break
            else:
                p.seek(-len(check)+1, 1)

        ch = p.read(1)
        col+=1

        if ch == '\n':
            col=0
            row+=1

print("loc: {}, {}".format(*location))

Upvotes: 0

Niklas R
Niklas R

Reputation: 16860

You can find a substring by using the strings find-method.

content = file.read()
name = 'abst'
if name in content:
    slice = content.find(name)
    slice = slice, slice + len(name)

The read(1)-method is absolutely senseless. #see edit

Edit: more effiecient for the memory

def find(file, name):
    length = len(name)
    part = file.read(length)
    i = 0
    while True:
        if part == name:
            break
        char = file.read(1)
        if not char:
            return
        part = part[1:] + char
        i += 1
    return i, i + length, part

I see, using read(1) isn't that senseless.

Upvotes: 1

phihag
phihag

Reputation: 287775

Use mmap to search the file with constant memory requirements:

import mmap
with open('/tmp/rr', 'rb') as f:
  m = mmap.mmap(f.fileno(), 0, mmap.MAP_PRIVATE, mmap.PROT_READ)
  position = m.index('abst')

Upvotes: 4

Related Questions