Reputation: 4128
I am trying to index data across multiple tables using Solr's Data Import Handler. The official wiki on the DIH suggests using embedded entities to link multiple tables like so:
<document>
<entity name="item" pk="id" query="SELECT * FROM item">
<entity name="member" pk="memberid" query="SELECT * FROM member WHERE memberid='${item.memberid}'>
</entity>
</entity>
</document>
Another way that works is:
<document>
<entity name="item" pk="id" query="SELECT * FROM item INNER JOIN member ON item.memberid=member.memberid">
</entity>
</document>
Are these two methods functionally different? Is there a performance difference? My guess would be that the first method is to support non-SQL tables, but I'm not sure.
Another though would be that, if using join tables in MySQL, using the SQL query method with multiple joins could cause multiple documents to be indexed instead of one.
Upvotes: 1
Views: 2351
Reputation: 603
If you want to optimize the performance you can use the second select to create a materialized view. This would means you can import with DIH directly from the materialized view (without creating any innerjoins or subentities in DIH, witch will means a lot faster import and the source DB will not be "overheated") If you want, you can create a cronjob to fast refresh your materialized view every day or as you need. If you are using MySQL you can take a look here for details on materialize views. Hope that will help.
Upvotes: 1
Reputation: 52799
Few things that I encountered :-
Would like to hear from Other users as well.
Upvotes: 2