Reputation: 19319
I'm using generic Sphinx with Python (though I tested this against PHP as well and got the same problem). I have a table where I have several fields I want to be able to search in sphinx against but it seems like only some of the fields get indexed.
Here's my source (dbconfig just has the connection information):
source bill_src : dbconfig
{
sql_query = \
SELECT id,title,official_title,summary,state,chamber,UNIX_TIMESTAMP(last_action) AS bill_date FROM bill
sql_attr_timestamp = bill_date
sql_query_info = SELECT * FROM bill WHERE id=$id
}
Here's the index
index bills
{
source = bill_src
path = /var/data/bills
docinfo = extern
charset_type = sbcs
}
I'm trying to use extended match mode. It seems that title
and summary
are fine but the official_title
, the state
and the chamber
fields are ignored in the index. So for example if I do:
@official_title Affordable Care Act
I get:
query error: no field 'official_title' found in schema
but the same query with @summary produces results. Any ideas what I'm missing?
EDIT Here's the table I'm trying to index:
+--------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| bt50_id | int(11) | YES | MUL | NULL | |
| type | varchar(10) | YES | | NULL | |
| title | varchar(255) | YES | | NULL | |
| official_title | text | YES | | NULL | |
| summary | text | YES | | NULL | |
| congresscritter_id | int(11) | NO | MUL | NULL | |
| last_action | datetime | YES | | NULL | |
| sunlight_id | varchar(45) | YES | | NULL | |
| number | int(11) | YES | | NULL | |
| state | char(2) | YES | | NULL | |
| chamber | varchar(45) | YES | | NULL | |
| session | varchar(45) | YES | | NULL | |
| featured | tinyint(1) | YES | | 0 | |
| source_url | varchar(255) | YES | | | |
+--------------------+--------------+------+-----+---------+----------------+
Upvotes: 0
Views: 5464
Reputation: 19319
I seem to have fixed the problem, though I'll admit this is all dumb luck so it might not be a root cause:
First I thought maybe it didn't like the order of the fields in the query I have the only attribute field last so I decided to move it to after the ID:
SELECT id, UNIX_TIMESTAMP(last_action) AS bill_date, \
title,official_title,summary,state,chamber, FROM bill
This did not fix the problem.
Secondly, I noticed all the example date fields are converted using UNIX_TIMESTAMP
and then aliased to the same name, so instead of UNIX_TIMESTAMP(last_action) AS bill_date
I changed it to UNIX_TIMESTAMP(last_action) AS last_action
... the first attempt tripped me up though because it still wasn't working.
Finally I dropped the date altogether and added each field successfully (re-indexing and testing each time). Each time it worked and finally I added the date field on the end and I was able to sort by it and search all the fields. So the final query is:
SELECT \
id,title,official_title,summary,state,chamber, \
UNIX_TIMESTAMP(last_action) AS last_action FROM bill
It seems that attribute fields must come after the full text fields and aliases must be the same name as the actual field name. I find it strange that the date field seemed fine but other fields suddenly disappeared (randomly!).
I hope this helps someone else though I feel it might be some kind of isolated bug that doesn't affect many people. (This is on OSX and sphinx was compiled by hand)
Upvotes: 1
Reputation: 3633
Little rusty on sphinx, but believe in your source { }
clause needs a sql_field_string definition.
source bill_src : dbconfig
{
sql_query = \
SELECT \
id,title,official_title,summary,state,chamber, \
UNIX_TIMESTAMP(last_action) AS bill_date \
FROM bill
sql_attr_timestamp = bill_date
sql_field_string = official_title
sql_query_info = SELECT * FROM bill WHERE id=$id
}
According to http://sphinxsearch.com/docs/1.10/conf-sql-field-string.html the sql_field_string declaration will index and store the string for referencing. That's different from a sql_attr_string, which is stored but not indexed.
Upvotes: 0