Cfreak
Cfreak

Reputation: 19319

Sphinx Search indexing some fields but not others

I'm using generic Sphinx with Python (though I tested this against PHP as well and got the same problem). I have a table where I have several fields I want to be able to search in sphinx against but it seems like only some of the fields get indexed.

Here's my source (dbconfig just has the connection information):

source bill_src : dbconfig
{
    sql_query       = \ 
        SELECT id,title,official_title,summary,state,chamber,UNIX_TIMESTAMP(last_action) AS bill_date FROM bill

    sql_attr_timestamp = bill_date

    sql_query_info      = SELECT * FROM bill WHERE id=$id
}

Here's the index

index bills
{
    source          = bill_src
    path            = /var/data/bills
    docinfo         = extern
    charset_type        = sbcs
}

I'm trying to use extended match mode. It seems that title and summary are fine but the official_title, the state and the chamber fields are ignored in the index. So for example if I do:

@official_title Affordable Care Act

I get:

query error: no field 'official_title' found in schema

but the same query with @summary produces results. Any ideas what I'm missing?

EDIT Here's the table I'm trying to index:

+--------------------+--------------+------+-----+---------+----------------+
| Field              | Type         | Null | Key | Default | Extra          |
+--------------------+--------------+------+-----+---------+----------------+
| id                 | int(11)      | NO   | PRI | NULL    | auto_increment |
| bt50_id            | int(11)      | YES  | MUL | NULL    |                |
| type               | varchar(10)  | YES  |     | NULL    |                |
| title              | varchar(255) | YES  |     | NULL    |                |
| official_title     | text         | YES  |     | NULL    |                |
| summary            | text         | YES  |     | NULL    |                |
| congresscritter_id | int(11)      | NO   | MUL | NULL    |                |
| last_action        | datetime     | YES  |     | NULL    |                |
| sunlight_id        | varchar(45)  | YES  |     | NULL    |                |
| number             | int(11)      | YES  |     | NULL    |                |
| state              | char(2)      | YES  |     | NULL    |                |
| chamber            | varchar(45)  | YES  |     | NULL    |                |
| session            | varchar(45)  | YES  |     | NULL    |                |
| featured           | tinyint(1)   | YES  |     | 0       |                |
| source_url         | varchar(255) | YES  |     |         |                |
+--------------------+--------------+------+-----+---------+----------------+

Upvotes: 0

Views: 5464

Answers (2)

Cfreak
Cfreak

Reputation: 19319

I seem to have fixed the problem, though I'll admit this is all dumb luck so it might not be a root cause:

First I thought maybe it didn't like the order of the fields in the query I have the only attribute field last so I decided to move it to after the ID:

 SELECT id, UNIX_TIMESTAMP(last_action) AS bill_date, \
     title,official_title,summary,state,chamber, FROM bill

This did not fix the problem.

Secondly, I noticed all the example date fields are converted using UNIX_TIMESTAMP and then aliased to the same name, so instead of UNIX_TIMESTAMP(last_action) AS bill_date I changed it to UNIX_TIMESTAMP(last_action) AS last_action ... the first attempt tripped me up though because it still wasn't working.

Finally I dropped the date altogether and added each field successfully (re-indexing and testing each time). Each time it worked and finally I added the date field on the end and I was able to sort by it and search all the fields. So the final query is:

    SELECT \
      id,title,official_title,summary,state,chamber, \ 
      UNIX_TIMESTAMP(last_action) AS last_action FROM bill

It seems that attribute fields must come after the full text fields and aliases must be the same name as the actual field name. I find it strange that the date field seemed fine but other fields suddenly disappeared (randomly!).

I hope this helps someone else though I feel it might be some kind of isolated bug that doesn't affect many people. (This is on OSX and sphinx was compiled by hand)

Upvotes: 1

pp19dd
pp19dd

Reputation: 3633

Little rusty on sphinx, but believe in your source { } clause needs a sql_field_string definition.

source bill_src : dbconfig
{
    sql_query       = \ 
        SELECT \
          id,title,official_title,summary,state,chamber, \
          UNIX_TIMESTAMP(last_action) AS bill_date \
        FROM bill

    sql_attr_timestamp = bill_date
    sql_field_string = official_title

    sql_query_info      = SELECT * FROM bill WHERE id=$id
}

According to http://sphinxsearch.com/docs/1.10/conf-sql-field-string.html the sql_field_string declaration will index and store the string for referencing. That's different from a sql_attr_string, which is stored but not indexed.

Upvotes: 0

Related Questions