Sayak Banerjee
Sayak Banerjee

Reputation: 1964

JPA / Hibernate is very slow when fetching data

I have a DAO that uses JPA/Hibernate to fetch data from SQL Server 2008 R2. In my specific usecase, i am doing a simple SELECT query that returns about 100000 records, but takes more than 35 minutes to do so.

I created a basic JDBC connection by manually loading the sql server driver and the same query returned 100k records in 15 seconds. So something is very wrong in my configuration.

Here is my springDataContext.xml:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:jpa="http://www.springframework.org/schema/data/jpa"
       xmlns:context="http://www.springframework.org/schema/context"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:tx="http://www.springframework.org/schema/tx"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
       http://www.springframework.org/schema/data/jpa http://www.springframework.org/schema/data/jpa/spring-jpa-1.3.xsd
       http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.1.xsd
       http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.1.xsd">

    <bean id="jpaVendorAdapter" class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter"/>

    <bean id="entityManagerFactory" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
        <property name="dataSource" ref="myDataSource"/>
        <property name="jpaVendorAdapter" ref="jpaVendorAdapter"/>
        <property name="persistenceXmlLocation" value="classpath:persistence.xml"/>
        <property name="jpaProperties">
            <props>
                <prop key="hibernate.dialect">org.hibernate.dialect.SQLServer2008Dialect</prop>
                <prop key="hibernate.ejb.naming_strategy">org.hibernate.cfg.ImprovedNamingStrategy</prop>
                <prop key="hibernate.format_sql">true</prop>
                <prop key="hibernate.hbm2ddl.auto">none</prop>
                <prop key="hibernate.use_sql_comments">false</prop>
                <prop key="hibernate.show_sql">${hibernate.show_sql:false}</prop>
                <prop key="jadira.usertype.autoRegisterUserTypes">true</prop>
                <prop key="jadira.usertype.javaZone">America/Chicago</prop>
                <prop key="jadira.usertype.databaseZone">America/Chicago</prop>
                <prop key="jadira.usertype.useJdbc42Apis">false</prop>
            </props>
        </property>
    </bean>

    <bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager">
        <property name="entityManagerFactory" ref="entityManagerFactory"/>
    </bean>

    <tx:annotation-driven/>

    <jpa:repositories base-package="com.mycompany.foo"/>

    <context:component-scan base-package="com.mycompany.foo.impl" />
</beans>

The bean myDataSource is provided by whatever app consumes the DAO, which is defined like so:

<bean id="myDataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
    <property name="driverClassName" value="com.microsoft.sqlserver.jdbc.SQLServerDriver"/>
    <property name="url" value="jdbc:sqlserver://mysqlhost.mycompany.com:1433;database=MYDB"/>
    <property name="username" value="username"/>
    <property name="password" value="chucknorris"/>
</bean>

I have a complex query wherein I am setting fetchSize:

package com.mycompany.foo.impl;

import com.mycompany.foo.RecurringLoanPaymentAccountsDao;
import org.hibernate.Query;
import org.hibernate.ScrollMode;
import org.hibernate.ScrollableResults;
import org.hibernate.Session;
import org.joda.time.DateTime;

import javax.inject.Named;
import javax.persistence.EntityManager;
import javax.persistence.PersistenceContext;

@Named
public class FooDaoImpl implements FooDao {
    @PersistenceContext
    private EntityManager entityManager;

    public ScrollableResults getData(int chunkSize, DateTime tomorrow, DateTime anotherDate) {
        Session session = entityManager.unwrap(Session.class);

        Query query = session.createQuery(
            "from MyAccountTable as account " +
            // bunch of left joins (I am using @JoinColumns in my models)
            "where account.some_date >= :tomorrow " +
            "and account.some_other_date < :anotherDate"
            // <snip snip>
        );

        query.setParameter("tomorrow", tomorrow)
             .setParameter("anotherDate", anotherDate)
             .setFetchSize(chunkSize);

        return query.scroll(ScrollMode.FORWARD_ONLY);
    }
}

I also switched to vanilla JPA and did jpaQuery.getResultList(), but that was equally slow.

I can provide other relevant code if required. Any clue on what I'm doing wrong here?

Update 1: Add schema details

Unfortunately, I work for a bank, so I cannot share the exact code. But let me try to represent the relevant bits.

Here is my main table that I am querying:

@Entity
@Table(name = "MY_TABLE")
public class MyTable {
    @EmbeddedId
    private Id id = new Id();

    @Column(name = "SOME_COL")
    private String someColumn;

    // other columns

    @OneToMany(fetch = FetchType.LAZY)
    @JoinColumns({
        @JoinColumn(name = "ACCT_NBR", referencedColumnName = "ACCT_NBR", insertable = false, updatable = false),
        @JoinColumn(name = "CUST_NBR", referencedColumnName = "CUST_NBR", insertable = false, updatable = false)
    })
    private List<ForeignTable> foreignTable;

    // getter and setter for properties

    public static class Id implements Serializable {
        @Column(name = "ACCT_NBR")
        private short accountNumber;

        @Column(name = "CUST_NBR")
        private short customerNumber;
    }
}

Basically, every table has ACCT_NBR and CUST_NBR columns (including the foreign table) which are unique when clustered together. So my join condition includes both of these.

The model for the foreign table looks exactly the same (with and embedded ID like the main table above), of course with its own set of columns in addition to the ACCT_NBR and CUST_NBR.

Now i only care about 2 other columns in the foreign table apart from the ID columns mentioned above: TYPE_ID and ACCT_BALANCE.

TYPE_ID is what I wish to use in my LEFT JOIN condition, so that the final SQL looks like this:

LEFT JOIN FOREIGN_TABLE FRN
ON MAIN.ACCT_NBR = FRN.ACCT_NBR
AND MAIN.CUST_NBR = FRN.CUST_NBR
AND FRN.TYPE_ID = <some_static_id>

I want the LEFT JOIN to yield NULL data when there is no match for this specific TYPE_ID.

And in my SELECT claus, I select FOREIGN_TABLE.ACCT_BALANCE, which of course will be null if the left join above has no matching row.

This method is invoked from a spring batch application's Tasklet, and I have a feeling that it gave a null pointer exception once the tasklet finished processing and therefore, the readonly transaction was closed.

Upvotes: 4

Views: 16263

Answers (1)

Guillaume F.
Guillaume F.

Reputation: 6453

In the comments you point out that your @Joincolumn are or FetchType.EAGER. This is very aggressive and your problem.

Eager means Hibernate will JOIN FETCH all said columns in your query, parse and load all the records to dispatch the data in new instances of your entities. You may know that, if you join table A and B, the result will be a huge A.*, B.* with A x B records and A repeated many times. This is what's going on with Eager. That can escalate very quickly, especially with many Eager columns.

Switching to Lazy tells Hibernate to not load the data. It just prepares a Proxy object which will call a separate SELECT only when required (if your transaction is still open).

You can always force a FETCH manually in your HQL with JOIN FETCH table. This is the preferable way.

Upvotes: 6

Related Questions