nabizan
nabizan

Reputation: 3365

mysql sphinx generate unique id

I have my database segmented in 8 parts, where each part contains database with table user_data, for better search performance im using sphinx to index all those data but im come accross one problem ... since table user_data dont have any unique field to represent each row but its a 1 to many table i have trouble to run my sphinx index correctly since it requires unique id and this way resolve in duplicate ids ... any idea how can i workaround this? or generate unique id throught all sub indexes from different segments?

example:

SELECT user_id, item_id, info
FROM user_data

Which returns something like:

+----------+-----------------------+
| user_id  | item_id    | info     |
+----------+-----------------------+
| 10       | 151        | asdf     |
| 10       | 152        | test     |
| 11       | 151        | 545      |
| 12       | 151        | sdfsd    |
| 12       | 152        | eewwe    |
| 12       | 153        | dfsd     |

but i have to get

+----------+-----------------------------+
| user_id  | item_id    | info     | id  |
+----------+-----------------------------+
| 10       | 151        | asdf     | 1   |
| 10       | 152        | test     | 2   |
| 11       | 151        | 545      | 3   |
| 12       | 151        | sdfsd    | 4   |
| 12       | 152        | eewwe    | 5   |
| 12       | 153        | dfsd     | 6   |

of course id must be unique throght all segments

Upvotes: 1

Views: 2820

Answers (5)

Ben
Ben

Reputation: 9

We are using crc32(uuid_short()) for 32 bit implementations of sphinx. This works, most of the time! Ofc. one cannot rely upon a 32 bit digest of a

Upvotes: -2

dennis
dennis

Reputation: 630

As proposed in another answer saying to use UUID. But sphinx can not use UUID as id. You will need an INT. Therefor use UUID_SHORT and then you have a unique integer (for mysql). If this does not work out of the box (e.g. if you aare using Ubuntu-11.04), you will get an error like this:

WARNING: DOCID_MAX document_id, skipping

You will need to compile sphinx source with –enable-id64, or just go to sphinx website and get an up to date package (which is compiled with –enable-id64). A more complete example of this indexing method is given in this blog entry

Upvotes: 0

nabizan
nabizan

Reputation: 3365

first of all you should set before action query to set your variable

sql_query_pre = SET @a := 1;

then use this variable to get fictional auto increment

sql_query = SELECT @a := @a + 1 AS id, user_id, item_id, info FROM user_data

Upvotes: 3

PureForm
PureForm

Reputation: 1003

You could do something like this while indexing:

SELECT user_id + 10 * 1 AS id, 1 AS segment_id, itme_id, info FROM user_data_1

... adding a segment_id. You would have eight of these, so the indexing query would look something like:

SELECT (user_id + 10) * 1 AS id, 1 AS segment_id, itme_id, info FROM user_data_1
UNION
SELECT (user_id + 10) * 2 AS id, 2 AS segment_id, itme_id, info FROM user_data_2
UNION
SELECT (user_id + 10) * 3 AS id, 3 AS segment_id, itme_id, info FROM user_data_3
UNION
SELECT (user_id + 10) * 4 AS id, 4 AS segment_id, itme_id, info FROM user_data_4
UNION
SELECT (user_id + 10) * 5 AS id, 5 AS segment_id, itme_id, info FROM user_data_5
UNION
SELECT (user_id + 10) * 6 AS id, 6 AS segment_id, itme_id, info FROM user_data_6
UNION
SELECT (user_id + 10) * 7 AS id, 7 AS segment_id, itme_id, info FROM user_data_7
UNION
SELECT (user_id + 10) * 8 AS id, 8 AS segment_id, itme_id, info FROM user_data_8

Then when you query sphinx and get back the IDs, just undo the arithmetic by dividing the id by segment_id and subtracting 10. This way all the ids will be unique within sphinx. Just make sure the attribute type can handle the size of the ids you'll be indexing.

Upvotes: 0

rxmnnxfpvg
rxmnnxfpvg

Reputation: 31013

I'm unfamiliar with Sphinx, but if you're looking to create ids that are unique across tables, in your case:

  1. One option is to use a UUID as a unique index on all the tables -- the chances of them colliding are minute.
  2. Another option is, if you know the max size of a table, to only use numbers in that range plus an offset. E.g., Table 1's ids: 1 - 10000, Table 2's ids: 10001 - 20000, etc. You can even set the id fields to be AUTO_INCREMENT and set their start numbers at the beginning of the particular range.

Upvotes: 1

Related Questions