That pesky Hibernate

Yesterday a collegue of mine asked if I could build some functionality to physically delete records in our database.
Since deleting things is always a good idea, I immediately started to work on his request.
As we are using an ORM (Hibernate) to manage our relational persistence, this would just be a matter of adding a single line of code:

session.delete(someEntity);

And all the Hibernate magic combined with some cascading would take care of the deletion.
So I concluded my assignment fast, and got the well know warm cosy feeling: 'job wel done'.
However, suddenly I realised that the combinaton: "hibernate, magic, job well done, warm cosy feeling" has put me into problems before.
In fact, I remeber this since last time I was in that position I ended up feeling exactly like this man did.

Just to be sure I went back to the code and configured my logging to print out what Hibernate was doing.
As it turned out, something bad was going on. It seems that also Hibernate has difficulties deleting relationships (thats a line to think about).

The problem is that each child is deleted one by one.
Lets say you have two entities: Person and Product. There is a one to many relationship between Person and Product.
The relationship is uni-directional from Person to Product, mapped as a Set, managed by Person (as there is only one side) and the cascading is set to all-delete-orphan.

<hibernate-mapping default-access="field" >
   <class name="entities.Person" table="person">
      <id name="id" column="id" access="property">
         <generator class="native"/>
      </id>

      <set name="products" cascade="all-delete-orphan">
         <key column="person_id" not-null="true" update="false" foreign-key="person_fk"/>
         <one-to-many class="entities.Product"/>
      </set>
   </class>
</hibernate-mapping>

<hibernate-mapping default-access="field" >
   <class name="entities.Product" table="product">
   
      <id name="id" column="id" access="property">
         <generator class="native"/>
      </id>
      
      <property name="name" column="name"/>
   </class>
</hibernate-mapping>

If you delete a Person object which has 5 products, you will see that pesky Hibernate doing this:

31398 [main] DEBUG org.hibernate.SQL  - delete from product where id=?
31399 [main] DEBUG org.hibernate.SQL  - delete from product where id=?
31399 [main] DEBUG org.hibernate.SQL  - delete from product where id=?
31399 [main] DEBUG org.hibernate.SQL  - delete from product where id=?
31399 [main] DEBUG org.hibernate.SQL  - delete from product where id=?
31400 [main] DEBUG org.hibernate.SQL  - delete from person where id=?

In some conditions this might be acceptable, however, in our case there could be thousands of childs records.
I do know that with bulk operations and large datasets Hibernate is maybe not the best choice.
However, we configured Hibernate that it just worked for our case.

For example; in our real scenario the "lazy" attribute on the Set has been set to "Extra"
This allows to get a count on the child Collection without Hibernate loading every child.
Other bulk operations we managed by doing HQL. So in the end it worked out very good (and fast) so we where happy with the ORM functionality without having to pay performance costs.
So I was not planning to let this problem become a performance bottleneck.

As it turns out this 'problem' is very known one and there is lots to be found on the internet. I learned that Hibernate should normally be able to do a 'single shot delete'.
It is even mentioned in the manual! So I tried every possible combination, but could not force hibernate to do that 'single shot delete'.

Secondly I tried to do it the 'bulk' way by just issuing some delete HQL.
In our case its not a problem that it bypasses the Session, since the transaction consists only out of the delete.

YAP! As it turned out, in my real scenario the Child has a Hibernate <map> relation to a properties table.
When doing an HQL delete, the cascading on the <map> is not respected. This seems to be a issue as well (here).
I also tried to delete the <map> manually using HQL, but that doesn't work since there is no entity to 'grab' in the query.

Thirdly, I learned about the "on-delete" attribute which you can set on the <key> element in the <Set>.
The only requirement is that the Set must be the non-managed side of the relation (so inverse must be 'true').
In my Person/Product example this is not the case (as there is only one side).
But in my real scenario, which is slightly different, this is the case: the association is mapped bi-directionally and the Child is owner of the relationship.

If you enable the on-delete = "cascade" Hibernate would leave the deletion to the database.
In other words, it will depend on the database cascading delete, so your table DDL should have the 'on delete cascade' in its FK constraint DDL.
Cool note, that if you use hbm2ddl Hibernate generates the correct DDL including the on delete cascade.

Good, good! The only thing I now had to do was to tell our db administrator to add the 'on delete cascade'.

But wait, I also still have the <map>, So I tried to add the 'on-delete= cascade' there as well.
BUT! on-delete = 'cascade' can only be added on inverse associations!
The <map> is mapped uni-directional and is therefore always the owner. So, bad luck once again, this did not work.

Agreed: at this point I could refactor the <map> construct to a normal stand alone Entity and make a bidrectional one-to-many/many-to-one.
But that would imply many hours refactoring for something that was just working fine, so I felt not doing this.

Fourthly (and finally) I decided to just bypass Hibernate completely for this delete.
I created a named query :

<sql-query name="deletePerson">delete from person where id = :personId</sql-query>

I asked the database administrators to add 'on delete cascade' to the FK's of Product and the map table.
To be able to test this in our in-memory database, I added some extra DDL in the Person mapping:

<database-object>
 <create>
  alter table product drop constraint person_fk
  alter table product add constraint person_fk foreign key (person_id) references person(person_id) on delete cascade
  <!-- same here for the <map> relation-->
 </create>
 <drop/>
</database-object>  

This allows to simulate the database cascading in my tests (which run against an in-memory database).
So to conclude my journey, the steps you can try if you want a fast delete:

1. Do not use Hibernate
2. Just use the not working one-shot-delete, so you don't have to complain about Hibernate in some blog entry

Now as for your real options:

1. Map your 'one' side of the association with on-delete="cascade" and alter the FK constraints in the database with cascading
2. Use HQL to delete the entities yourself (do not forget this is a bulk operation and the delete is not reflected in the Session!)
3. Bypass Hibernate and use plain SQL + database cascading (same remark about the session)

Oh, btw, if you use the first suggestion, be aware that (even if you are using lazy true/extra on the association) Hibernate always loads the complete association before issuing the delete. Ain't that cute or what?
So if you have large datasets going on and you want performance, afaik you are limited to 2 and 3.

Comments

  1. Your deletes would've made at least a bit more sense had you been using optimistic locking; those would had been "delete from tbl where id = ? and version = ?".

    I'm pretty sure hibernate batches the deletes well, so that one statement per table is prepared and others are fired using it; of course it can never beat "delete from tbl where other_id = ?" but then again there's no optimistic locking there either.

    Not sure if it'd be faster if hibernate would issue one large update per table like:

    delete from tbl where (id, version) in ( (?, ?), (?, ?), ...)

    (at least postgres supports this)

    That would be both efficient (single scan for the rdbms or optimizable) and support optimistic locking.

    ReplyDelete
  2. which in memory database are you using buddy ? is that KDB ?


    Thanks
    Javin
    How to detect deadlock in java

    ReplyDelete
  3. Why you are not using hibernate annotations? are you forced to use Java < version 5?

    ReplyDelete
  4. @Joonas:

    You are right. In typical scenario's you would be using some kind of locking. And, if you would be using optimistic locking, then performing a simple "delete from child where parent_id = ?" (as I described) will break optimistic locking, since we did not take the version column into account.

    I overlooked this since I'm actually not using any form of locking in the scenario I described.
    So again, in case of optimistic locking the "cascade delete DDL" will not be an option.

    So in the optimistic locking case it does make sense that hibernate deletes the rows one by one to make sure the version column is considered. But as you pointed out: even then there exist probably better ways to do it (using in clause and combining id/version for example) so I'm not sure that even in that case we should be happy that hibernate deletes the childs one by one, without giving us another option.

    However, I do still believe (and that was the point of me writing this) that if you don't use optimistic locking (you use pessimistic locking or no locking at all) there is no reason that hibernate should default to this non performant way of deleting relationships. In that case there is no technical reason why they are deleted one by one.

    ReplyDelete
  5. @javarevisited:

    For this example I was using HSQLDB

    ReplyDelete

Post a Comment