Jade Dungeon

Hibernate

配置
Hibernate二级缓存
SQL日志
- 如何找到原始查询操作
- 设置Hibernate的查询日志

配置

Hibernate二级缓存

How does the second level-cache work?

The second level cache stores the entity data, but NOT the entities themselves. The data is stored in a 'dehydrated' format which looks like a hash map where the key is the entity Id, and the value is a list of primitive values.

Here is an example on how the contents of the second-level cache look:

*-----------------------------------------*
|          Person Data Cache              |
|-----------------------------------------|
| 1 -> [ "John" , "Q" , "Public" , null ] |
| 2 -> [ "Joey" , "D" , "Public" ,  1   ] |
| 3 -> [ "Sara" , "N" , "Public" ,  1   ] |
*-----------------------------------------*

The second level cache gets populated when an object is loaded by Id from the database, using for example entityManager.find(), or when traversing lazy initialized relations.

How does the query cache work?

The query cache looks conceptually like an hash map where the key is composed by the query text and the parameter values, and the value is a list of entity Id's that match the query:

*----------------------------------------------------------*
|                       Query Cache                        |                     
|----------------------------------------------------------|
| ["from Person where firstName=?", ["Joey"] ] -> [1, 2] ] |
*----------------------------------------------------------*

Some queries don't return entities, instead they return only primitive values. In those cases the values themselves will be stored in the query cache. The query cache gets populated when a cacheable JPQL/HQL query gets executed.

What is the relation between the two caches?

If a query under execution has previously cached results, then no SQL statement is sent to the database. Instead the query results are retrieved from the query cache, and then the cached entity identifiers are used to access the second level cache.

If the second level cache contains data for a given Id, it re-hydrates the entity and returns it. If the second level cache does not contain the results for that particular Id, then an SQL query is issued to load the entity from the database.

How to setup the two caches in an application

The first step is to include the hibernate-ehcache jar in the classpath:

<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-ehcache</artifactId>
    <version>SOME-HIBERNATE-VERSION</version>
</dependency>

The following parameters need to be added to the configuration of your EntityManagerFactory or SessionFactory:

<prop key="hibernate.cache.use_second_level_cache">true</prop>
<prop key="hibernate.cache.use_query_cache">true</prop>
<prop key="hibernate.cache.region.factory_class">org.hibernate.cache.ehcache.EhCacheRegionFactory</prop>
<prop key="net.sf.ehcache.configurationResourceName">/your-cache-config.xml</prop>

Prefer using EhCacheRegionFactory instead of SingletonEhCacheRegionFactory. Using EhCacheRegionFactory means that Hibernate will create separate cache regions for Hibernate caching, instead of trying to reuse cache regions defined elsewhere in the application.

The next step is to configure the cache regions settings, in file your-cache-config.xml:

<?xml version="1.0" ?>
<ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             updateCheck="false"
       xsi:noNamespaceSchemaLocation="ehcache.xsd" name="yourCacheManager">

     <diskStore path="java.io.tmpdir"/>

     <cache name="yourEntityCache"
            maxEntriesLocalHeap="10000"
            eternal="false"
            overflowToDisk="false"
            timeToLiveSeconds="86400" />

     <cache name="org.hibernate.cache.internal.StandardQueryCache"
            maxElementsInMemory="10000"
            eternal="false
            timeToLiveSeconds="86400"
            overflowToDisk="false"
            memoryStoreEvictionPolicy="LRU" />

  <defaultCache
          maxElementsInMemory="10000"
          eternal="false"
          timeToLiveSeconds="86400"
          overflowToDisk="false"
          memoryStoreEvictionPolicy="LRU" />
</ehcache>

If no cache settings are specified, default settings are taken, but this is probably best avoided. Make sure to give the cache a name by filling in the name attribute in the ehcache element.

Giving the cache a name prevents it from using the default name, which might already be used somewhere else on the application.

Using the second level cache

The second level cache is now ready to be used. In order to cache entities, annotate them with the @org.hibernate.annotations.Cache annotation:

@Entity       
@Cache(usage=CacheConcurrencyStrategy.READ_ONLY, 
     region="yourEntityCache")
public class SomeEntity {
    ...
}

Associations can also be cached by the second level cache, but by default this is not done. In order to enable caching of an association, we need to apply @Cache to the association itself:

@Entity       
public class SomeEntity {

    @OneToMany
    @Cache(usage=CacheConcurrencyStrategy.READ_ONLY,
        region="yourCollectionRegion")
     private Set<OtherEntity> other;     

}

Using the query cache

After configuring the query cache, by default no queries are cached yet. Queries need to be marked as cached explicitly, this is for example how a named query can be marked as cached:

@NamedQuery(name="account.queryName", query="select acct from Account ...",
   hints={ @QueryHint(name="org.hibernate.cacheable", value="true")}
)

And this is how to mark a criteria query as cached:

List cats = session.createCriteria(Cat.class)
    .setCacheable(true).list();

The next section goes over some pitfalls that you might run into while trying to setup these two caches. These are behaviors that work as designed but still can be surprising.

Pitfall 1 - Query cache worsens performance causing a high volume of queries

There is an harmful side-effect of how the two caches work, that occurs if the cached query results are configured to expire more frequently than the cached entities returned by the query.

If a query has cached results, it returns a list of entity Id's, that is then resolved against the second level cache. If the entities with those Ids where not configured as cacheable or if they have expired, then a select will hit the database per entity Id.

For example if a cached query returned 1000 entity Ids, and non of those entities where cached in the second level cache, then 1000 selects by Id will be issued against the database.

The solution to this problem is to configure query results expiration to be aligned with the expiration of the entities returned by the query.

Pitfall 2 - Cache limitations when used in conjunction with @Inheritance

It is currently not possible to specify different caching policies for different subclasses of the same parent entity.

For example this will not work:

@Entity
@Inheritance
@Cache(CacheConcurrencyStrategy.READ_ONLY)
public class BaseEntity {
    ...
}

@Entity
@Cache(CacheConcurrencyStrategy.READ_WRITE)
public class SomeReadWriteEntity extends BaseEntity {
    ...
}

@Entity
@Cache(CacheConcurrencyStrategy.TRANSACTIONAL)
public class SomeTransactionalEntity extends BaseEntity {
    ...
}

In this case only the @Cache annotation of the parent class is considered, and all concrete entities have READ_ONLY concurrency strategy.

Pitfall 3 - Cache settings get ignored when using a singleton based cache

It is advised to configure the cache region factory as a EhCacheRegionFactory , and specify an ehcache configuration via net.sf.ehcache.configurationResourceName.

There is an alternative to this region factory which is SingletonEhCacheRegionFactory. With this region factory the cache regions are stored in a singleton using the cache name as a lookup key.

The problem with the singleton region factory is that if another part of the application had already registered a cache with the default name in the singleton, this causes the ehcache configuration file passed via net.sf.ehcache.configurationResourceName to be ignored.

Conclusion

The second level and query caches are very useful if set up correctly, but there are some pitfalls to bear in mind in order to avoid unexpected behaviors. All in all it's a feature that works transparently and that if well used can increase significantly the performance of an application.

Please let us know in the comments bellow your own experience and pitfalls you have encountered. Thanks for reading.

Useful Links

This blog post is a well-known reference to the inner details of the Hibernate second level and query caches - Truly Understanding the Second-Level and Query Caches

SQL日志

需要注意的是，Hibernate只记录从它发送到JDBC的准备语句（prepared statement）及参数。准备语句使用?作为查询参数的占位符，这些参数的实际值被记录在准备语句的下方。

这些准备语句和最终发送到数据库的sql语句是不同的，对于这些最终的查询操作Hibernate 无法记录。出现这种情况的原因是Hibernate只知道它发送给JDBC的准备语句和参数，实际的查询是由JDBC构建并发送给数据库的。

为了产生实际查询的日志，像log4jdbc这种工具是必不可少的，这里不会讨论如何使用 log4jdbc。参看：log4jdbc

如何找到原始查询操作

上述的可记录查询包含一条标注，在大多数情况下它可以标识某条起始查询语句。如果一条查询是由加载引起的，那么标注便是/*load your.entity.Name*/。如果是一条命名查询，那么标注则包含查询的名称。

如果它是一个对应许多延迟加载的查询，标注则会包含对应类的名称和引发该操作的属性值等。

设置Hibernate的查询日志

为了获得查询日志，需要将如下标签加入会话工厂的配置文件中：

<bean id= "entityManagerFactory" >
  ...
  <property name="jpaProperties" >
  <props>
      <prop key="hibernate.show_sql" >true</ prop>
      <prop key="hibernate.format_sql" >true</ prop>
      <prop key="hibernate.use_sql_comments">true</prop>
  </props>
</property>

上面的示例展示了Spring实体管理工厂的配置。下面是对一些标签的解释：

show_sql：激活查询日志功能。
format_sql：优雅地输出Sql。
use_sql_comments：添加一条解释型标注。

为了记录查询语句的参数信息，log4j或者相对应的信息是需要的。

<logger name="org.hibernate.type">
    <level value="trace" />
</logger >

如果上述功能都不能运行

在大多数情况下，use_sql_comments创建的标注是足够用来标识查询的起始。但如果这还不够，我们可以标识和数据表名相关联的查询返回的实体，并在返回的实体构造函数中设置断点。

如果一个实体没有构造函数，我们可以创建一个构造函数并把断点设置在super()函数调用中。

@Entity
public class Employee {
    public Employee() {
        super(); // put the breakpoint here
    }
    ...
}

设置断点后，跳转到包含程序堆栈信息的Debug界面并从头到尾执行一遍。这样在调用栈中将会出现查询操作在何处被创建。