Space Ostrich
Space Ostrich

Reputation: 423

Randomly search a text file for keyword using fastest & efficient string-search method

I've got a text file with one customer record per line. Each record is formatted as "ID num, first name, last name, dollar amount". I need to read a line of this text file based on the ID number entered by the user.

I've been following a Java ebook that does this by using the length of a single record and multiplying it by the ID number entered. The problem is, that only works if every record has the exact same length. My records do not truncate or pad out the first and last name, and the dollar amount ranges from two to five characters in length which means that the method that the book uses won't work.

Is there any way to read a specific line in a text file without requiring all the records to be the exact same length? I'd have thought that there would be a way to use the line separator character to do it.

For reference I'll put up the code that doesn't work due to my varying record sizes, in case it helps.

public static void main(String[] args)
{
  Scanner keyboard = new Scanner(System.in);
  Path filepath = Paths.get("U:\\Programming\\Java\\Chapter 13\\customersdata.txt");
  String s = "  , , , 00.00" + System.getProperty("line.separator");
  final int RECSIZE = s.length();
  byte[] data = s.getBytes();
  ByteBuffer buffer = ByteBuffer.wrap(data);
  FileChannel fc = null;

  try {
     fc = (FileChannel)Files.newByteChannel(filepath, READ, WRITE);
     System.out.println("Enter an ID number to display the customer details for that ID. Or \"quit\".");
     String idString = keyboard.nextLine();

     while(!idString.equals("quit")) {
        int id = Integer.parseInt(idString);
        buffer = ByteBuffer.wrap(data);
        fc.position(id * RECSIZE);
        fc.read(buffer);
        s = new String(data);
        System.out.println("ID #" + id + " " + s);
        System.out.println("Enter an ID number to display the customer details for that ID. Or \"quit\".");
        idString = keyboard.nextLine();
     }
     fc.close();
  }catch(Exception e) {
     System.out.println("Error message: " + e);
  }
}

EDIT: As the text file being read from could hypothetically contain tens of thousands of records, I can't use Sequential Access, if the ID number I need is near the bottom of the file, it would take an unacceptable amount of time to read them all, as such, the solution must be Random Access.

Upvotes: 0

Views: 1015

Answers (1)

rupinderjeet
rupinderjeet

Reputation: 2838

I've got a text file with one customer record per line. Each record is formatted as "ID num, first name, last name, dollar amount". I need to read a line of this text file based on the ID number entered by the user.

and

Is there any way to read a specific line in a text file without requiring all the records to be the exact same length?

In the main method at readData("33"), i hardcoded the id string. You can change it according to your data.txt and get the data.

data.txt

1 harry singh 456
2 lauren dat 25
33 pingle pooh 8797
10002 yogeshvari bahman 897461

parseTxt.java

import java.io.File;
import java.util.Scanner;

public class parseTxt {

    private static Scanner fileReader ;

    public static void main(String[] args)
    {
        try{
            readData("33");
        } catch(Exception e){
            System.out.println("Exception : " + e);
        }   
    }  

    private static void readData(String id) throws Exception{
        fileReader = new Scanner(new File("E://data.txt"));
        String cusId, fname, lname, dollar;

        while(fileReader.hasNextLine()){
            String line = fileReader.nextLine();
            String[] lineParts = line.split(" ");

            if(lineParts[0].equals(id)){        // lineParts[0] is ID NUMBER
                cusId = lineParts[0];
                fname = lineParts[1];
                lname = lineParts[2];
                dollar = lineParts[3];

                System.out.println("Customer ID : #" + cusId);
                System.out.println("First Name : " + fname);
                System.out.println("Last Name : " + lname);
                System.out.println("Dollar Amount : $" + dollar);

                break;
            } else {
                System.out.println("This ID:" + id + " does not exist");
            }
        }

    }
}

For Edited Question (search while keeping good performance)

source-1:

try (SeekableByteChannel ch = Files.newByteChannel(Paths.get("test.txt"))) {
    ByteBuffer bb = ByteBuffer.allocateDirect(1000);
    for(;;) {
        StringBuilder line = new StringBuilder();
        int n = ch.read(bb);
        // add chars to line
        // ... don't forget to break
    }
}

It requires a bit of coding but it can be really faster because of ByteBuffer.allocateDirect. It allows OS to read bytes from file to ByteBuffer directly, without copying

source-2: Every answer on this link adds bits of information

  1. Convert the search string ('are') to a byte-array in the same encoding as the file.
  2. Open a memory-mapped byte-buffer from a File-Channel on the file.
  3. Scan the ByteBuffer, looking for matches to the search byte-array
  4. count the newlines as you go.
  5. close the ByteBuffer

source-3:

A simple technique that could well be considerably faster than indexOf() is to use a Scanner, with the method findWithinHorizon(). If you use a constructor that takes a File object, Scanner will internally make a FileChannel to read the file. And for pattern matching it will end up using a Boyer-Moore algorithm for efficient string searching.

source-4: Implementation of Boyer-Moore's String Search algorithm

I am sorry but I will leave the researching to you. If you ask my suggestion, I think GNU-Grep is faster of them all because it, too, uses Boyer-Moore's string search algorithm. Hope this helps! correct me if i misunderstood your problem.

Upvotes: 1

Related Questions