Reputation: 36756
I have a table with words from a text (the table is called token
), each word is a row in the table. I want to retrieve adjacent words in the result.
Example: My name is Renato
must return:
My | name
name | is
is | Renato
The following query works, but is slow. The textblockid
determines the text that the word belongs, the sentence
is the sentence count in the textblock (but at the moment the value is 1 for all) and the position
attribute determines the order of the words.
select w1.text,w2.text
from token as w1,
(select textblockid,sentence,position,text from token
order by textblockid,sentence,position) as w2
where w1.textblockid = w2.textblockid
and w1.sentence = w2.sentence
and w1.position = w2.position - 1
Is there a better/faster way to do this?
Upvotes: 0
Views: 81
Reputation: 71939
Maybe an INNER JOIN
with a second instance of token
performs better. But it all depends on the data types of your columns, and the indexes you have in place.
For example, if sentence
is a text column, the comparison between w1.sentence
and w2.sentence
will probably be very expensive. If it's a numeric id (a foreign key to a sentences
table), and if you have an index on the column, it should be way faster. Assuming this last scenario, you could try this:
select w1.text,w2.text
from token as w1
INNER JOIN token as w2
ON w2.sentence = w1.sentence
AND w1.position = w2.position - 1
AND w1.textblockid = w2.textblockid
Upvotes: 0
Reputation: 2694
I don't know postgresql in detail but for sure query could be simpler in sql server:
select w1.text,w2.text
from token as w1, token as w2
where w1.textblockid = w2.textblockid
and w1.sentence = w2.sentence
and w1.position = w2.position - 1
(I think it is better to use simplest query and leave the rest for optimizer, which may be misleaded by your from subquery).
However if you have index on (textblockid, sentence, position) you really can't get anything more with sql.
Upvotes: 1