Reputation: 2358
I have a problem. I'm trying to read a large .txt file, but I don't need every piece of data that's inside.
My .txt file looks something like this:
8000000 abcdefg hijklmn word word letter
I only need, let's say, the number and the first two text positions: "abcdefg" and "hijklmn" and write it to another file after that. I don't know how to read and write just the data that I need.
Here is my code so far:
BufferedReader br = new BufferedReader(new FileReader("position2.txt"));
BufferedWriter bw = new BufferedWriter(new FileWriter("position.txt"));
String line;
while ((line = br.readLine())!= null){
if(line.isEmpty() || line.trim().equals("") || line.trim().equals("\n")){
continue;
}else{
//bw.write(line + "\n");
String[] data = line.split(" ");
bw.write(data[0] + " " + data[1] + " " + data[2] + "\n");
}
}
br.close();
bw.close();
}
Can you give me some sugestions ? Thanks in advance
UPDATE: My .txt files are a bit weird. Using the code above works great when there is only one single " " between them. My files can have a \t or more spaces, or a \t and some spaces between the words. Ho can I proceed now ?
Upvotes: 2
Views: 18596
Reputation: 1936
else {
String[] res = line.split(" ");
bw.write(res[0] + " " + res[1] + " " + res[2] + "\n"); // the first three words...
}
Upvotes: 0
Reputation: 4191
If your files are really huge (above 50-100 MB maybe GBs) and you are sure that the first word is a number and you need two words after that I would suggest you to read one line and iterate through that string. Stop when you find 3rd space.
String str = readLine();
int num_spaces = 0, cnt = 0;
String arr[] = new String[3];
while(num_spaces < 3){
if(str.charAt(cnt) == ' '){
num_space++;
}
else{
arr[num_space] += str.charAt(cnt);
}
}
If your data is couple of MB only or have a lot of numbers inside, no need to worry about iterating char by char. Just read line by line and split lines then check the words
as it is mentioned
Upvotes: 0
Reputation: 208
Assuming all lines of your text file follow the structure you described then you could do this: Replace FILE_PATH with your actual file path.
public static void main(String[] args) {
try {
Scanner reader = new Scanner(new File("FILE_PATH/myfile.txt"));
PrintWriter writer = new PrintWriter(new File("FILE_PATH/myfile2.txt"));
while (reader.hasNextLine()) {
String line = reader.nextLine();
String[] tokens = line.split(" ");
writer.println(tokens[0] + ", " + tokens[1] + ", " + tokens[2]);
}
writer.close();
reader.close();
} catch (FileNotFoundException ex) {
System.out.println("Error: " + ex.getMessage());
}
}
You'll get something like: word0, word1, word2
Upvotes: 0
Reputation: 159185
Depending on the complexity of you data, you have a few options.
If the lines are simple space-separated values like shown, the simplest is to split the text, and write the values you want to keep to the new file:
try (BufferedReader br = new BufferedReader(new FileReader("text.txt"));
BufferedWriter bw = new BufferedWriter(new FileWriter("data.txt"))) {
String line;
while ((line = br.readLine()) != null) {
String[] values = line.split(" ");
if (values.length >= 3)
bw.write(values[0] + ' ' + values[1] + ' ' + values[2] + '\n');
}
}
If the values might be more complex, you could use a regular expression:
Pattern p = Pattern.compile("^(\\d+ \\w+ \\w+)");
try (BufferedReader br = new BufferedReader(new FileReader("text.txt"));
BufferedWriter bw = new BufferedWriter(new FileWriter("data.txt"))) {
String line;
while ((line = br.readLine()) != null) {
Matcher m = p.matcher(line);
if (m.find())
bw.write(m.group(1) + '\n');
}
}
This ensures that first value is digits only, and second and third values are word-characters only (a-z A-Z _ 0-9
).
Upvotes: 2