Victor Smirnov
Victor Smirnov

Reputation: 3780

Rebuild sphinx index fail

We have 4 sphinx indexes built using data from one table. All indexes have the same source settings except that they take different documents. We have checks like this mod(id, 4) = <index number> to distribute documents and document attributes between indexes.

Question: One of the four indexes (the same one) fails to rebuild almost every time we rebuild the indexes. Other indexes never have this issue and are rebuild correctly.

We have partitioned the documents and attribute tables. For example this is how documents table is partitioned:

PARTITION BY HASH(mod(id, 4))(
  PARTITION `p0` COMMENT '',
  PARTITION `p1` COMMENT '',
  PARTITION `p2` COMMENT '',
  PARTITION `p3` COMMENT ''
);

We think that indexer hangs after it has received all documents but before it starts receiving attributes. We can see this when we check sessions on MySQL server.

The index which fails to rebuild is using mod(id, 4) = 0 condition.

We use Sphinx 2.0.4-release on Ubuntu 64bit 12.04.02 LTS.


Data source config

source ble_job_2 : ble_job
{
    sql_query = select job_notice.id as id, \
        body, title, source, company, \
        UNIX_TIMESTAMP(insertDate) as date, \
        substring(company, 1, 1) as companyletter, \
        job_notice.locationCountry as country, \
        location_us_state.stateName as state, \
        0 as expired, \
        clusterId, \
        groupCity, \
        groupCityAttr, \
        job_notice.cityLat as citylat, \
        job_notice.cityLng as citylng, \
        job_notice.zipLat as ziplat, \
        job_notice.zipLng as ziplng, \
        feedId, job_notice.rating as rating, \
        job_notice.cityId as cityid \
        from job_notice \
        left join location_us_state on job_notice.locationState = location_us_state.stateCode \
        where job_notice.status != 'expired' \
        and mod(job_notice.id, 4) = 1

    sql_attr_multi = uint attr from query; \
        select noticeId, attributeId as attr from job_notice_attribute where mod(noticeId, 4) = 1
} # source ble_job_2

Index config

index ble_job_2
{
    type            = plain
    source          = ble_job_2
    path            = /var/lib/sphinxsearch/data/ble_job_2

    docinfo         = extern
    mlock           = 0
    morphology      = none
    stopwords       = /etc/sphinxsearch/stopwords/blockwords.txt
    min_word_len    = 1
    charset_type    = utf-8
    enable_star     = 0
    html_strip      = 0
} # index_ble_job_2

Any help would be greatly appreciated.

Warm regards.

Upvotes: 1

Views: 608

Answers (1)

Victor Smirnov
Victor Smirnov

Reputation: 3780

Luckily we have fixed the issue.

We have applied the range query setup and this helped us to get index rebuild stable. I think this is because Sphinx runs several queries and each returns limited relatively small set of results. This allows MySQL to complete the query normally and sent all the results back to Sphinx.

The same issue is described on Sphinx forum Indexer Hangs & MySQL Query Sleeps.


The changes in the config for data source are

    sql_query_range = SELECT MIN(id),MAX(id) FROM job_notice where mod(job_notice.id, 4) = 1
    sql_range_step = 200000
    sql_query = select job_notice.id as id, \
    ...
        and mod(job_notice.id, 4) = 1 and job_notice.id >= $start AND job_notice.id <= $end

Please note that no ranges should be applied to sql_attr_multi query - Bad query in Sphinx MVA

Upvotes: 0

Related Questions