Dev2017
Dev2017

Reputation: 938

Flink Task Manager timeout

My program gets very slow as more and more records are processed. I initially thought it is due to excessive memory consumption as my program is String intensive (I am using Java 11 so compact strings should be used whenever possible) so I increased the JVM Heap:

-Xms2048m
-Xmx6144m

I also increased the task manager's memory as well as timeout, flink-conf.yaml:

jobmanager.heap.size: 6144m
heartbeat.timeout: 5000000

However, none of this helped with the issue. The Program still gets very slow at about the same point which is after processing roughly 3.5 million records, only about 0.5 million more to go. As the program approaches the 3.5 million mark it gets very very slow until it eventually times out, total execution time is about 11 minutes.

I checked the memory consumption in VisualVm, but the memory consumption never goes more than about 700MB.My flink pipeline looks as follows:

final StreamExecutionEnvironment environment = StreamExecutionEnvironment.createLocalEnvironment(1);
environment.setParallelism(1);
DataStream<Tuple> stream = environment.addSource(new TPCHQuery3Source(filePaths, relations));
stream.process(new TPCHQuery3Process(relations)).addSink(new FDSSink());
environment.execute("FlinkDataService");

Where the bulk of the work is done in the process function, I am implementing data base join algorithms and the columns are stored as Strings, specifically I am implementing query 3 of the TPCH benchmark, check here if you wish https://examples.citusdata.com/tpch_queries.html.

The timeout error is this:

java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id <id> timed out.

Once I got this error as well:

Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap space

Also, my VisualVM monitoring, screenshot is captured at the point where things get very slow: VisualVM

Here is the run loop of my source function:

  while (run) {
        readers.forEach(reader -> {
            try {
                String line = reader.readLine();
                if (line != null) {
                    Tuple tuple = lineToTuple(line, counter.get() % filePaths.size());
                    if (tuple != null && isValidTuple(tuple)) {
                        sourceContext.collect(tuple);
                    }
                } else {
                    closedReaders.add(reader);
                    if (closedReaders.size() == filePaths.size()) {
                        System.out.println("ALL FILES HAVE BEEN STREAMED");
                        cancel();
                    }
                }
                counter.getAndIncrement();
            } catch (IOException e) {
                e.printStackTrace();
            }
        });
    }

I basically read a line of each of the 3 files I need, based on the order of the files, I construct a tuple object which is my custom class called tuple representing a row in a table, and emit that tuple if it is valid i.e. fullfils certain conditions on the date.

I am also suggesting the JVM to do garbage collection at the 1 millionth, 1.5millionth, 2 millionth and 2.5 millionth record like this:

System.gc()

Any thoughts on how I can optimize this?

Upvotes: 1

Views: 3792

Answers (2)

Felipe
Felipe

Reputation: 7623

these are the properties that I changed on my link stand-alone cluster to compute the TPC-H query 03.

jobmanager.memory.process.size: 1600m
heartbeat.timeout: 100000
taskmanager.memory.process.size: 8g # defaul: 1728m

I implemented this query to stream only the Order table and I kept the other tables as a state. Also I am computing as a windowless query, which I think it makes more sense and it is faster.

public class TPCHQuery03 {

    private final String topic = "topic-tpch-query-03";

    public TPCHQuery03() {
        this(PARAMETER_OUTPUT_LOG, "127.0.0.1", false, false, -1);
    }

    public TPCHQuery03(String output, String ipAddressSink, boolean disableOperatorChaining, boolean pinningPolicy, long maxCount) {
        try {
            StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);

            if (disableOperatorChaining) {
                env.disableOperatorChaining();
            }

            DataStream<Order> orders = env
                    .addSource(new OrdersSource(maxCount)).name(OrdersSource.class.getSimpleName()).uid(OrdersSource.class.getSimpleName());

            // Filter market segment "AUTOMOBILE"
            // customers = customers.filter(new CustomerFilter());

            // Filter all Orders with o_orderdate < 12.03.1995
            DataStream<Order> ordersFiltered = orders
                    .filter(new OrderDateFilter("1995-03-12")).name(OrderDateFilter.class.getSimpleName()).uid(OrderDateFilter.class.getSimpleName());

            // Join customers with orders and package them into a ShippingPriorityItem
            DataStream<ShippingPriorityItem> customerWithOrders = ordersFiltered
                    .keyBy(new OrderKeySelector())
                    .process(new OrderKeyedByCustomerProcessFunction(pinningPolicy)).name(OrderKeyedByCustomerProcessFunction.class.getSimpleName()).uid(OrderKeyedByCustomerProcessFunction.class.getSimpleName());

            // Join the last join result with Lineitems
            DataStream<ShippingPriorityItem> result = customerWithOrders
                    .keyBy(new ShippingPriorityOrderKeySelector())
                    .process(new ShippingPriorityKeyedProcessFunction(pinningPolicy)).name(ShippingPriorityKeyedProcessFunction.class.getSimpleName()).uid(ShippingPriorityKeyedProcessFunction.class.getSimpleName());

            // Group by l_orderkey, o_orderdate and o_shippriority and compute revenue sum
            DataStream<ShippingPriorityItem> resultSum = result
                    .keyBy(new ShippingPriority3KeySelector())
                    .reduce(new SumShippingPriorityItem(pinningPolicy)).name(SumShippingPriorityItem.class.getSimpleName()).uid(SumShippingPriorityItem.class.getSimpleName());

            // emit result
            if (output.equalsIgnoreCase(PARAMETER_OUTPUT_MQTT)) {
                resultSum
                        .map(new ShippingPriorityItemMap(pinningPolicy)).name(ShippingPriorityItemMap.class.getSimpleName()).uid(ShippingPriorityItemMap.class.getSimpleName())
                        .addSink(new MqttStringPublisher(ipAddressSink, topic, pinningPolicy)).name(OPERATOR_SINK).uid(OPERATOR_SINK);
            } else if (output.equalsIgnoreCase(PARAMETER_OUTPUT_LOG)) {
                resultSum.print().name(OPERATOR_SINK).uid(OPERATOR_SINK);
            } else if (output.equalsIgnoreCase(PARAMETER_OUTPUT_FILE)) {
                StreamingFileSink<String> sink = StreamingFileSink
                        .forRowFormat(new Path(PATH_OUTPUT_FILE), new SimpleStringEncoder<String>("UTF-8"))
                        .withRollingPolicy(
                                DefaultRollingPolicy.builder().withRolloverInterval(TimeUnit.MINUTES.toMillis(15))
                                        .withInactivityInterval(TimeUnit.MINUTES.toMillis(5))
                                        .withMaxPartSize(1024 * 1024 * 1024).build())
                        .build();

                resultSum
                        .map(new ShippingPriorityItemMap(pinningPolicy)).name(ShippingPriorityItemMap.class.getSimpleName()).uid(ShippingPriorityItemMap.class.getSimpleName())
                        .addSink(sink).name(OPERATOR_SINK).uid(OPERATOR_SINK);
            } else {
                System.out.println("discarding output");
            }

            System.out.println("Stream job: " + TPCHQuery03.class.getSimpleName());
            System.out.println("Execution plan >>>\n" + env.getExecutionPlan());
            env.execute(TPCHQuery03.class.getSimpleName());
        } catch (IOException e) {
            e.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) throws Exception {
        new TPCHQuery03();
    }
}

The UDFs are here: OrderSource, OrderKeyedByCustomerProcessFunction, ShippingPriorityKeyedProcessFunction, and SumShippingPriorityItem. I am using the com.google.common.collect.ImmutableList since the state will not be updated. Also I am keeping only the necessary columns on the state, such as ImmutableList<Tuple2<Long, Double>> lineItemList.

Upvotes: 1

Dev2017
Dev2017

Reputation: 938

String intern() saved me. I did intern on every string before storing it in my maps and that worked like a charm.

Upvotes: 1

Related Questions