DIH Mysql to Solr import problems

Question

I have trouble Indexing Documents from Mysql to Solr.

My Config:
data-config.xml

schema.xml

With this configuration i get Documents like this:

"docs": [
      {
        "content": "[B@7f017c71",
        "id": 20785923,
        "cr": "2014-07-24T08:01:58Z",
        "title": "general motors entdeckt neue mängel bei hunderttausenden wagen - news - alle aktuellen news - dpa-afx - general motors dl-,01 - onvista",
        "kundenid": 1,
        "_version_": 1474502436614832000
      },

The title gets indexed properly

The content shows up as bullshit chars and is not searchable.

Any ideas how i can fix that?

Thanks in advance.

Jayesh Bhoyar · Accepted Answer

I suspect that your content field in DB must be text/BLOB and not varchar (as title must be varchar). Hence you are able to index title correctly and content is not getting indexed correctly.

If you are having a BLOB of data or text data in DB then it would possibly be useful to use a field type that has the right set of tokenizers, analyzers and filters.

For example, adding a StandardTokenizerFactory keeps tokens to a meaningful value set.

An example of the fieldtype definition:

If the problem still persist then following information will help you in investigating this issue:

1) Can check what values you get from MYSQL when you run query: SELECT id,kundenid,LOWER(title) as title,LOWER(content) as content, DATE_FORMAT(cr,'%Y-%m-%dT%H:%i:%sZ') as cr,lang FROM articledata WHERE DATE(cr) BETWEEN DATE(DATE_SUB(now(),INTERVAL 3 DAY)) AND DATE(now()) AND content IS NOT NULL ORDER BY DATE(cr) DESC"

2) Try to change textgen to string.

3) Try Removing stripHTML="true" from content

Hope this will help you in resolving your issue or at least help you in investigating further.

DIH Mysql to Solr import problems

Answers (1)

Related Questions