Reputation: 24621

How to find substring in file?

How to find string in binary file using only read(1) ? For example I want to found position of string 'abst' in file ( without load to memory ) ? It's work but very primitive:

#!/usr/bin/python2
f = open("/tmp/rr", "rb")
f.seek(0)

cont = 1
while(cont):
    a1 = f.read(1)
    if a1 == 'a':
        a2 = f.read(1)
        if a2 == 'b':
            a3 = f.read(1)
            if a3 == 's':
                a4 = f.read(1)
                if a4 == 't':
                    found = True
                    cont = 0

Upvotes: 1

Answers (4)

Rumple Stiltskin

Reputation: 10395

Will this work for you?

#!/usr/bin/python

string = "abst"
f = open("/tmp/rr", "rb")
f.seek(0)

cont = 1
idx = 0
while True:
    c = f.read(1)
    if c == '':
        break
    if c == string[idx]:
        idx += 1
    elif c == string[0]:
        idx = 1
    else:
        idx = 0
    if idx == len(string):
        print "Found"
        break

Upvotes: 2

jan zegan

Reputation: 1657

If your file is mostly filled with 'a's, or whatever character corresponds to the first character in the string you're searching for, this algo will suck big time, otherwise works pretty well.

check = 'abst'
col=1
row=1
location = (-1, -1)

with open("/tmp/rr", 'rb') as p:
    ch = p.read(1)
    while(ch != ""):
        if ch == check[0]:
            st = p.read(len(check)-1)
            if ch+st == check:
                location = (row, col)
                break
            else:
                p.seek(-len(check)+1, 1)

        ch = p.read(1)
        col+=1

        if ch == '\n':
            col=0
            row+=1

print("loc: {}, {}".format(*location))

Upvotes: 0

Niklas R

Reputation: 16860

You can find a substring by using the strings find-method.

content = file.read()
name = 'abst'
if name in content:
    slice = content.find(name)
    slice = slice, slice + len(name)

The read(1)-method is absolutely senseless. #see edit

Edit: more effiecient for the memory

def find(file, name):
    length = len(name)
    part = file.read(length)
    i = 0
    while True:
        if part == name:
            break
        char = file.read(1)
        if not char:
            return
        part = part[1:] + char
        i += 1
    return i, i + length, part

I see, using read(1) isn't that senseless.

Upvotes: 1

phihag

Reputation: 287775

Use mmap to search the file with constant memory requirements:

import mmap
with open('/tmp/rr', 'rb') as f:
  m = mmap.mmap(f.fileno(), 0, mmap.MAP_PRIVATE, mmap.PROT_READ)
  position = m.index('abst')

Upvotes: 4

How to find substring in file?

Answers (4)

Related Questions