Reputation: 16841
I have a "plain old text file" where lines end with a new line character. For arbitrary reasons I need to read and parse this text file 4 (X for generality) lines at a time.
I'd like to use Java streams for this task and I know I can turn the file into a stream like so:
try (Stream<String> stream = Files.lines(Paths.get("file.txt""))) {
stream.forEach(System.out::println);
} catch (IOException e) {
e.printStackTrace();
}
But how can I use Java's Stream API to "bunch" the file into groups of 4 consecutive lines?
Upvotes: 4
Views: 3897
Reputation: 42234
There is a way to partition and process your file content into n
-size chunks using standard Java 8 Stream API. You can use Collectors.groupingBy()
to partition your file content into chunks - you can collect them as a Collection<List<String>>
or you can apply some processing while collecting all lines (e.g. you can join them to a single String).
Take a look at following example:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;
public class ReadFileWithStream {
public static void main(String[] args) throws IOException {
// Path to a file to read
final Path path = Paths.get(ReadFileWithStream.class.getResource("/input.txt").toURI());
final AtomicInteger counter = new AtomicInteger(0);
// Size of a chunk
final int size = 4;
final Collection<List<String>> partitioned = Files.lines(path)
.collect(Collectors.groupingBy(it -> counter.getAndIncrement() / size))
.values();
partitioned.forEach(System.out::println);
}
}
My input file contains some numbers (one number at a line), and when I run following code I get something like:
[0, 0, 0, 2]
[0, -3, 2, 0]
[1, -3, -8, 0]
[2, -12, -11, -11]
[-8, -1, -8, 0]
[2, -1, 2, -1]
... and so on
Collectors.groupingBy()
allows me also to use different downstream collector. By default Collectors.toList()
is being used so my result is accumulated into a List<String>
and I get Collection<List<String>>
as a final result.
Let's say I want to read 4-size chunks and I want to sum all numbers in a chunk. In this case I will use Collectors.summingInt()
as my downstream function and the returned result is Collection<Integer>
:
final Collection<Integer> partitioned = Files.lines(path)
.collect(Collectors.groupingBy(it -> counter.getAndIncrement() / size, Collectors.summingInt(Integer::valueOf)))
.values();
Output:
2
-1
-10
-32
-17
2
-11
-49
... and so on
And last but not least. Collectors.groupingBy()
returns a map where values are grouped by specific keys. That's why in the end we call Map.values()
to get a collection of the values this contained in this map.
Hope it helps.
Upvotes: 4
Reputation: 3947
If you want to stick with the streams, the only solution I see is to write your own custom collector. It's not intended for that purpose, but you can make use of it.
private static final class CustomCollector {
private List<String> list = new ArrayList<>();
private List<String> acumulateList = new ArrayList<>();
public void accept(String str) {
acumulateList.add(str);
if (acumulateList.size() == 4) { // acumulate 4 strings
String collect = String.join("", acumulateList);
// I just joined them in on string, you can do whatever you want
list.add(collect);
acumulateList = new ArrayList<>();
}
}
public CustomCollector combine(CustomCollector other) {
throw new UnsupportedOperationException("Parallel Stream not supported");
}
public List<String> finish() {
if(!acumulateList.isEmpty()) {
list.add(String.join("", acumulateList));
}
return list;
}
public static Collector<String, ?, List<String>> collector() {
return Collector.of(CustomCollector::new, CustomCollector::accept, CustomCollector::combine, CustomCollector::finish);
}
}
And use it like so :
stream.collect(CustomCollector.collector());
Upvotes: 2
Reputation: 34470
Here's a straightforward way using Guava's Iterators.partition
method:
try (Stream<String> stream = Files.lines(Paths.get("file.txt""))) {
Iterator<List<String>> iterator = Iterators.partition(stream.iterator(), 4);
// iterator.next() returns each chunk as a List<String>
} catch (IOException e) {
// handle exception properly
}
This is only suitable for sequential processing, but if you are reading a file from disk, I can hardly imagine any benefit from parallel processing...
EDIT: If you want, instead of working with the iterator, you could convert it again to a stream:
Stream<List<String>> targetStream = StreamSupport.stream(
Spliterators.spliteratorUnknownSize(iterator, Spliterator.ORDERED),
false);
Upvotes: 3
Reputation: 19527
If you're open to using RxJava, you could use its buffer
capability:
Stream<String> stream = Files.lines(Paths.get("file.txt"))
Observable.fromIterable(stream::iterator)
.buffer(4) // Observable<List<String>>
.map(x -> String.join(", ", x)) // Observable<String>
.forEach(System.out::println);
buffer
creates an Observable
that collects elements in lists of a certain size. In the above example, I added another transformation via map
to make the list more print-friendly, but you can transform the Observable
as you see fit. For example, if you had a method processChunk
that took as an argument a List<String>
and returned a String
, you could do:
Observable<String> fileObs =
Observable.fromIterable(stream::iterator)
.buffer(4)
.map(x -> processChunk(x));
Upvotes: 2
Reputation: 298429
This is a job for java.util.Scanner
. In Java 9, you can simply use
try(Scanner s = new Scanner(PATH)) {
s.findAll("(.*\\R){1,4}")
.map(mr -> Arrays.asList(mr.group().split("\\R")))
.forEach(System.out::println);
}
For Java 8, you can use the back-port of findAll
of this answer. After adding an import static
for that method, you can use it like
try(Scanner s = new Scanner(PATH)) {
findAll(s, Pattern.compile("(.*\\R){1,4}"))
.map(mr -> Arrays.asList(mr.group().split("\\R")))
.forEach(System.out::println);
}
Note that the result of the match operation is a single string containing up to four lines (less for the last line(s)). If that’s suitable for your follow-up operation, you can skip splitting that string into individual lines.
You may even use the MatchResult
’s properties for a more sophisticated processing of the chunks, e.g.
try(Scanner s = new Scanner(PATH)) {
findAll(s, Pattern.compile("(.*)\\R(?:(.*)\\R)?(?:(.*)\\R)?(?:(.*)\\R)?"))
.flatMap(mr -> IntStream.rangeClosed(1, 4)
.mapToObj(ix -> mr.group(ix)==null? null: ix+": "+mr.group(ix)))
.filter(Objects::nonNull)
.forEach(System.out::println);
}
Upvotes: 4