Priyanka Basu
Priyanka Basu

Reputation: 407

Insert query failed in Vertica with ERROR code 4534 when triggered from RStudio

I am executing an insert query on Vertica DB and it is working fine when triggered from a SQL client(SQuirrel). But when I am trying to trigger the same query from RStudio it is returning the following error:

Error in .local(conn, statement, ...) : execute JDBC update query failed in dbSendUpdate ([Vertica]VJDBC ERROR: Receive on v_default_node0005: Message receipt from v_default_node0008 failed [])

The SQL query somewhat looks like:

insert into SCHEMA1.TEMP_NEW(
       SELECT C.PROGRAM_GROUP_ID,
              C.POPULATION_ID,
              C.PROGRAM_ID,
              C.FULLY_QUALIFIED_NAME,
              C.STATE,
              C.DATA_POINT_TYPE,
              C.SOURCE_TYPE,
              B.SOURCE_DATA_PARTITION_ID AS DATA_PARTITION_ID,
              C.PRIMARY_CODE_PRIMARY_DISPLAY,
              C.PRIMARY_CODE_ID,
              C.PRIMARY_CODING_SYSTEM_ID,
              C.PRIMARY_CODE_RAW_CODE_DISPLAY,
              C.PRIMARY_CODE_RAW_CODE_ID,
              C.PRIMARY_CODE_RAW_CODING_SYSTEM_ID,
              (C.COMPONENT_QUALIFIED_NAME)||('/2') AS SPLIT_PART,
              Count(*) AS RECORD_COUNT 
       from (SELECT DPL.PROGRAM_GROUP_ID,
                    DPL.POPULATION_ID,
                    DPL.PROGRAM_ID,
                    DPL.FULLY_QUALIFIED_NAME,
                   'MET' AS STATE,
                    DPL.DATA_POINT_TYPE,
                    DPL.IDENTIFIER_SOURCE_TYPE AS SOURCE_TYPE,
                    DPL.IDENTIFIER_SOURCE_DATA_PARTITION_ID AS DATA_PARTITION_ID,
                    DPL.PRIMARY_CODE_PRIMARY_DISPLAY,
                    DPL.PRIMARY_CODE_ID,
                    DPL.PRIMARY_CODING_SYSTEM_ID,
                    DPL.PRIMARY_CODE_RAW_CODE_DISPLAY,
                    DPL.PRIMARY_CODE_RAW_CODE_ID,
                    DPL.PRIMARY_CODE_RAW_CODING_SYSTEM_ID,
                    DPL.supporting_data_point_lite_id,
                    DPL.COMPONENT_QUALIFIED_NAME,
                    COUNT(*) AS RECORD_COUNT                           
                    FROM SCHEMA2.TABLE1 DPL
                    WHERE DPL.DATA_POINT_TYPE <> 'PREFERRED_DEMOGRAPHICS'
                    AND DPL.DATA_POINT_TYPE <> 'PERSON_DEMOGRAPHICS'
                    AND DPL.DATA_POINT_TYPE <> 'CALCULATED_RISK_SCORE'
                    AND DPL.DATA_POINT_TYPE <> '_NOT_RECOGNIZED'
                    AND DPL.POPULATION_ID NOT ILIKE '%ARCHIVE%'
                    AND DPL.POPULATION_ID NOT ILIKE '%SNAPSHOT%'
                    AND DPL.PROGRAM_GROUP_ID = '<PROGRAM_GROUP_ID>'
                    AND PROGRAM_GROUP_ID IS NOT NULL
                    AND DPL.IDENTIFIER_SOURCE_DATA_PARTITION_ID IS NULL
                    AND DPL.PRIMARY_CODE_RAW_CODE_ID IS NOT NULL
                    AND DPL.PRIMARY_CODE_ID IS NOT NULL
                    AND EXISTS (SELECT 1
                                FROM SCHEMA2.TABLE2 MO
                                WHERE MO.STATE = 'MET'
                                AND MO.POPULATION_ID NOT ILIKE '%ARCHIVE%'
                                AND MO.POPULATION_ID NOT ILIKE '%SNAPSHOT%'
                                AND DPL.PROGRAM_GROUP_ID = MO.PROGRAM_GROUP_ID
                                AND DPL.PROGRAM_ID = MO.PROGRAM_ID
                                AND DPL.FULLY_QUALIFIED_NAME = MO.FULLY_QUALIFIED_NAME
                                AND DPL.OUTCOME_SEQUENCE = MO.MEASURE_OUTCOME_SEQ
                                AND MO.PROGRAM_GROUP_ID = '<PROGRAM_GROUP_ID>')
             GROUP BY 1,
                      2,
                      3,
                      4,
                      5,
                      6,
                      7,
                      8,
                      9,
                      10,
                      11,
                      12,
                      13,
                      14,
                      15,
                      16) AS C
       Left Join            

             (SELECT DISTINCT SOURCE_DATA_PARTITION_ID,
                              supporting_data_point_lite_id  
              FROM SCHEMA2.TABLE3 DPI
              where DPI.SOURCE_DATA_PARTITION_ID is not null
              AND EXISTS (SELECT 1
                          FROM (SELECT DPL.supporting_data_point_lite_id 
                                FROM SCHEMA2.TABLE1 DPL
                                WHERE DPL.DATA_POINT_TYPE <> 'PREFERRED_DEMOGRAPHICS'
                                AND DPL.DATA_POINT_TYPE <> 'PERSON_DEMOGRAPHICS'
                                AND DPL.DATA_POINT_TYPE <> 'CALCULATED_RISK_SCORE'
                                AND DPL.DATA_POINT_TYPE <> '_NOT_RECOGNIZED'
                                AND DPL.POPULATION_ID NOT ILIKE '%ARCHIVE%'
                                AND DPL.POPULATION_ID NOT ILIKE '%SNAPSHOT%'
                                AND DPL.PROGRAM_GROUP_ID = '<PROGRAM_GROUP_ID>'
                                AND PROGRAM_GROUP_ID IS NOT NULL
                                AND DPL.IDENTIFIER_SOURCE_DATA_PARTITION_ID IS NULL
                                AND DPL.PRIMARY_CODE_RAW_CODE_ID IS NOT NULL
                                AND DPL.PRIMARY_CODE_ID IS NOT NULL
                                AND EXISTS (SELECT 1
                                            FROM SCHEMA2.TABLE2 MO
                                            WHERE MO.STATE = 'MET'
                                            AND MO.POPULATION_ID NOT ILIKE '%ARCHIVE%'
                                            AND MO.POPULATION_ID NOT ILIKE '%SNAPSHOT%'
                                            AND DPL.PROGRAM_GROUP_ID = MO.PROGRAM_GROUP_ID
                                            AND DPL.PROGRAM_ID = MO.PROGRAM_ID
                                            AND DPL.FULLY_QUALIFIED_NAME = MO.FULLY_QUALIFIED_NAME
                                            AND DPL.OUTCOME_SEQUENCE = MO.MEASURE_OUTCOME_SEQ
                                            AND MO.PROGRAM_GROUP_ID = '<PROGRAM_GROUP_ID>')) SDP                                       
                          WHERE DPI.supporting_data_point_lite_id = SDP.supporting_data_point_lite_id)) AS B 
       on C.supporting_data_point_lite_id = B.supporting_data_point_lite_id 
       group by 1,
                2,
                3,
                4,
                5,
                6,
                7,
                8,
                9,
                10,
                11,
                12,
                13,
                14,
                15)  

Only the schema name and the table names have been replaced. All other details the same.

Can someone please help me to fix the error.

Upvotes: 0

Views: 1553

Answers (1)

twall
twall

Reputation: 51

This error means some node-to-node communication that happened during the processing of your query failed for some reason.

There are many possible reasons this could happen. Sometimes a poor network or other environment issues could cause this to occur. If v_default_node0008 was taken down while this query was running for example, you may see this message. Other times it can be the sign of a Vertica bug, in which case you'd have to take it up with support and/or your administrator.

Normally when a query plan is executing, the control flow happens from the bottom up. At the lowest levels of the plan, various scan(s) read from projections, and when there's no data left to feed to operators above the scan(s), they stop, which causes their neighboring operators to stop, until ultimately the root operator stops and the query finishes.

Occasionally, there is a need to end the query in a top-down fashion. When you have many nodes, each passing data between multiple threads in service of your query, it can be tricky for Vertica to tear down everything atomically in a deterministic fashion. If a thread sending data stops before the thread receiving data was expecting it to (because the receiver hasn't realized the plan is being stopped yet), then it may log this error message. Usually when that happens it is innocuous; you'll see it in vertica.log but it doesn't bubble all the way up to the application. If one of those is making its way to the application then it is probably a Vertica bug.

So when can this happen?

One common scenario is when you have a LIMIT clause. The different scans producing rows on different nodes can't coordinate directly, so they have to be told by operators higher up in the plan when the limit has been reached.

It also happens when a query gets canceled. Cancellation can happen for many reasons -- at the request of the application, from the dba running interrupt_statement on your query, or via resource pool policy. If you exceed the RUNTIMECAP for your resource pool for example, the query is automatically cancelled if it exceeds a configured execution time threshold.

There may be others too, but those are the most common cases. It won't always be obvious that either limits or cancels are happening to you. The query may be rewritten to include a limit at various stages, and the application or and/or DBA's policy may be affecting things under the cover.

While this doesn't directly solve your problem, it hopefully gives you some additional context and ideas for further troubleshooting. The problem is likely going to be very specific to your use case, environment and data, and could be a bug. If you can't make progress I'd suggest taking it to Vertica support, since they will be more equipped to help you make sense of this further.

Upvotes: 3

Related Questions