Andrei Olar
Andrei Olar

Reputation: 2358

Read specific data from a .txt file JAVA

I have a problem. I'm trying to read a large .txt file, but I don't need every piece of data that's inside.

My .txt file looks something like this:

8000000 abcdefg hijklmn word word letter

I only need, let's say, the number and the first two text positions: "abcdefg" and "hijklmn" and write it to another file after that. I don't know how to read and write just the data that I need.

Here is my code so far:

    BufferedReader br = new BufferedReader(new FileReader("position2.txt"));
    BufferedWriter bw = new BufferedWriter(new FileWriter("position.txt"));
    String line;

    while ((line = br.readLine())!= null){
        if(line.isEmpty() || line.trim().equals("") || line.trim().equals("\n")){
            continue;
        }else{
            //bw.write(line + "\n");
            String[] data = line.split(" ");
            bw.write(data[0] + " " + data[1] + " " + data[2] + "\n");
        }

    }

    br.close();
    bw.close();

}

Can you give me some sugestions ? Thanks in advance

UPDATE: My .txt files are a bit weird. Using the code above works great when there is only one single " " between them. My files can have a \t or more spaces, or a \t and some spaces between the words. Ho can I proceed now ?

Upvotes: 2

Views: 18596

Answers (4)

QuakeCore
QuakeCore

Reputation: 1936

else {
     String[] res = line.split(" ");
     bw.write(res[0] + " " + res[1] + " " + res[2] + "\n"); // the first three words...
}

Upvotes: 0

smttsp
smttsp

Reputation: 4191

If your files are really huge (above 50-100 MB maybe GBs) and you are sure that the first word is a number and you need two words after that I would suggest you to read one line and iterate through that string. Stop when you find 3rd space.

String str = readLine();
int num_spaces = 0, cnt = 0;
String arr[] = new String[3];
while(num_spaces < 3){
    if(str.charAt(cnt) == ' '){
        num_space++;
    }
    else{
        arr[num_space] += str.charAt(cnt);
    }
}

If your data is couple of MB only or have a lot of numbers inside, no need to worry about iterating char by char. Just read line by line and split lines then check the words as it is mentioned

Upvotes: 0

JorgeZ
JorgeZ

Reputation: 208

Assuming all lines of your text file follow the structure you described then you could do this: Replace FILE_PATH with your actual file path.

public static void main(String[] args) {
    try {
        Scanner reader = new Scanner(new File("FILE_PATH/myfile.txt"));
        PrintWriter writer = new PrintWriter(new File("FILE_PATH/myfile2.txt"));
        while (reader.hasNextLine()) {
            String line = reader.nextLine();
            String[] tokens = line.split(" ");

            writer.println(tokens[0] + ", " + tokens[1] + ", " + tokens[2]);
        }
        writer.close();
        reader.close();
    } catch (FileNotFoundException ex) {
        System.out.println("Error: " + ex.getMessage());
    }
}

You'll get something like: word0, word1, word2

Upvotes: 0

Andreas
Andreas

Reputation: 159185

Depending on the complexity of you data, you have a few options.

If the lines are simple space-separated values like shown, the simplest is to split the text, and write the values you want to keep to the new file:

try (BufferedReader br = new BufferedReader(new FileReader("text.txt"));
     BufferedWriter bw = new BufferedWriter(new FileWriter("data.txt"))) {
    String line;
    while ((line = br.readLine()) != null) {
        String[] values = line.split(" ");
        if (values.length >= 3)
            bw.write(values[0] + ' ' + values[1] + ' ' + values[2] + '\n');
    }
}

If the values might be more complex, you could use a regular expression:

Pattern p = Pattern.compile("^(\\d+ \\w+ \\w+)");
try (BufferedReader br = new BufferedReader(new FileReader("text.txt"));
     BufferedWriter bw = new BufferedWriter(new FileWriter("data.txt"))) {
    String line;
    while ((line = br.readLine()) != null) {
        Matcher m = p.matcher(line);
        if (m.find())
            bw.write(m.group(1) + '\n');
    }
}

This ensures that first value is digits only, and second and third values are word-characters only (a-z A-Z _ 0-9).

Upvotes: 2

Related Questions