Reputation: 11
I'm tying to use Solr highlight but I'm facing some issue. When I post this url http://localhost:8983/solr/pesquisa-jurisprudencia/select?fl=id,assunto&hl=on&q=insalubridade&wt=json&hl.fl=*
it doesn´t return any highlighted term:
{
"responseHeader":{
"status":0,
"QTime":167,
"params":{
"q":"insalubridade",
"hl":"on",
"fl":"id,assunto",
"hl.fl":"*",
"wt":"json"}},
"response":{"numFound":8,"start":0,"docs":[
{
"id":"saj-4815412",
"assunto":["Adicional de Insalubridade",
"Assistência Judiciária Gratuita",
"Aviso-prévio",
"Décimo Terceiro Salário [Proporcional]",
"Férias [Proporcionais]",
"Fruição / Gozo",
"Horas Extras",
"Indenização / Dobra / Terço Constitucional",
"Intervalo Intrajornada",
"Levantamento / Liberação",
"Multa [de 40%] do FGTS",
"Reflexos",
"Salário por Equiparação / Isonomia",
"Saldo de Salário"]},
{
"id":"saj-4676226",
"assunto":["Adicional de Insalubridade",
"Assistência Judiciária Gratuita",
"Aviso-prévio",
"Décimo Terceiro Salário [Proporcional]",
"Férias [Proporcionais]",
"Fruição / Gozo",
"Horas Extras",
"Indenização / Dobra / Terço Constitucional",
"Intervalo Intrajornada",
"Levantamento / Liberação",
"Multa [de 40%] do FGTS",
"Reflexos",
"Salário por Equiparação / Isonomia",
"Saldo de Salário"]},
{
"id":"saj-661600"},
{
"id":"pje1-24544513",
"assunto":["Saldo de Salário"]},
{
"id":"pje2-8188452",
"assunto":["Adicional de Insalubridade",
"Grupo Econômico"]},
{
"id":"pje2-10910741",
"assunto":["Adicional de Insalubridade",
"Grupo Econômico"]},
{
"id":"pje2-7109330",
"assunto":["Adicional de Horas Extras"]},
{
"id":"pje1-6880206",
"assunto":["Efeitos",
"Integração em Verbas Rescisórias"]}]
},
"highlighting":{
"saj-4815412":{},
"saj-4676226":{},
"saj-661600":{},
"pje1-24544513":{},
"pje2-8188452":{},
"pje2-10910741":{},
"pje2-7109330":{},
"pje1-6880206":{}}}
Although, when I search for "adicional" by posting this url http://localhost:8983/solr/pesquisa-jurisprudencia/select?fl=id,assunto&hl=on&q=adicional&wt=json&hl.fl=*
it works.
{
"responseHeader":{
"status":0,
"QTime":88,
"params":{
"q":"adicional",
"hl":"on",
"fl":"id,assunto",
"hl.fl":"*",
"wt":"json"}},
"response":{"numFound":32,"start":0,"docs":[
{
"id":"saj-4815412",
"assunto":["Adicional de Insalubridade",
"Assistência Judiciária Gratuita",
"Aviso-prévio",
"Décimo Terceiro Salário [Proporcional]",
"Férias [Proporcionais]",
"Fruição / Gozo",
"Horas Extras",
"Indenização / Dobra / Terço Constitucional",
"Intervalo Intrajornada",
"Levantamento / Liberação",
"Multa [de 40%] do FGTS",
"Reflexos",
"Salário por Equiparação / Isonomia",
"Saldo de Salário"]},
{
"id":"pje1-14030983",
"assunto":["Diferenças por Desvio de Função"]},
{
"id":"saj-4676226",
"assunto":["Adicional de Insalubridade",
"Assistência Judiciária Gratuita",
"Aviso-prévio",
"Décimo Terceiro Salário [Proporcional]",
"Férias [Proporcionais]",
"Fruição / Gozo",
"Horas Extras",
"Indenização / Dobra / Terço Constitucional",
"Intervalo Intrajornada",
"Levantamento / Liberação",
"Multa [de 40%] do FGTS",
"Reflexos",
"Salário por Equiparação / Isonomia",
"Saldo de Salário"]},
{
"id":"pje2-8188452",
"assunto":["Adicional de Insalubridade",
"Grupo Econômico"]},
{
"id":"saj-661600"},
{
"id":"pje1-13247674",
"assunto":["Adicional de Hora Extra"]},
{
"id":"sap2-732470",
"assunto":["Horas In Itinere",
"Supressão de Horas Extras Habituais - Indenização"]},
{
"id":"pje1-24446947",
"assunto":["Abono",
"Abono Pecuniário",
"Acordo Individual e/ou Coletivo de Trabalho",
"Adicional",
"Adicional de Hora Extra",
"Adicional de Horas Extras",
"Alteração da Jornada",
"Aviso Prévio",
"Base de Cálculo",
"Cartão de Ponto",
"Controle de Jornada",
"Desconfiguração de Justa Causa",
"Décimo Terceiro Salário",
"Décimo Terceiro Salário Proporcional",
"Efeitos",
"FGTS",
"Folha Individual de Presença",
"Fruição / Gozo",
"Férias / Gozo / Fruição",
"Férias Proporcionais",
"Indenizado - Efeitos",
"Indenização",
"Indenização / Dobra / Terço Constitucional",
"Indenização Adicional",
"Intervalo Intrajornada",
"Levantamento de Valor",
"Liberação / Entrega das Guias",
"Multa de 40% do FGTS",
"Multa do Artigo 467 da CLT",
"Multa do Artigo 477 da CLT",
"Reflexos",
"Saldo de Salário",
"Seguro Desemprego",
"Termo de Rescisão Contratual",
"Verbas Rescisórias",
"Ônus da Prova"]},
{
"id":"pje1-35506695",
"assunto":["Diferenças por Desvio de Função"]},
{
"id":"sap2-493296"}]
},
"highlighting":{
"saj-4815412":{
"assunto":["<em>Adicional</em> de Insalubridade"]},
"pje1-14030983":{},
"saj-4676226":{
"assunto":["<em>Adicional</em> de Insalubridade"]},
"pje2-8188452":{
"assunto":["<em>Adicional</em> de Insalubridade"]},
"saj-661600":{},
"pje1-13247674":{
"assunto":["<em>Adicional</em> de Hora Extra"]},
"sap2-732470":{},
"pje1-24446947":{
"assunto":["<em>Adicional</em>"]},
"pje1-35506695":{},
"sap2-493296":{}}}
What I could notice since now, is that it works when the field starts with the term I'm searching for. If the term is in the middle of the sentence, it doesn´t return any highlight.
Why is it happening. How can I fix that? I'm using solr 8.5, and, very important, I'm a solr beginner.... ;-)
So, I ran a few more tests and made some notes about them on this table.
If you check at my first test (#1) you can see that it worked. But just because I narrowed the query by the field "assunto" at "q" parameter. When I specified a general query (not specifying the field), it doesn't work (check the test #3). Although, at test #6 I searched for a word "adicional" that is at the beginning of the fiel and it worked. At test #4 I repeated test #3, but changing the method to "unified" and it didn't work either. It doesn't seems to be a issue related to the method.
Here goes my schema (I deleted all the comments for space purpose):
<?xml version="1.0" encoding="UTF-8" ?>
<!-- Definindo o próprio esquema
<schema name="example-DIH-db" version="1.6"-->
<schema name="sentencas" version="1.6">
<field name="_version_" type="plong" indexed="true" stored="true"/>
<field name="_root_" type="string" indexed="true" stored="false"/>
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="id_documento" type="string" indexed="true" stored="true"/>
<field name="id_processo" type="string" indexed="true" stored="true"/>
<field name="num_processo" type="string" indexed="true" stored="true"/>
<field name="ano_processo" type="string" indexed="true" stored="true"/>
<field name="id_tipo_documento" type="string" indexed="true" stored="true"/>
<field name="tipo_documento" type="string" indexed="true" stored="true"/>
<field name="origem_dados" type="string" indexed="true" stored="true"/>
<field name="sigla_classe_processual" type="string" indexed="true" stored="true"/>
<field name="desc_classe_processual" type="string" indexed="false" stored="true"/>
<field name="orgao_julgador" type="string" indexed="true" stored="true"/>
<field name="juiz_sentenciante" type="string" indexed="true" stored="true"/>
<field name="turma" type="string" indexed="true" stored="true"/>
<field name="relator" type="string" indexed="true" stored="true"/>
<field name="data_referencia" type="date" indexed="true" stored="true"/>
<field name="nome_data_referencia" type="string" indexed="true" stored="true"/>
<field name="data_assinatura" type="date" indexed="true" stored="true"/>
<field name="data_publicacao" type="date" indexed="true" stored="true"/>
<field name="ementa" type="text_trt18" indexed="true" stored="true" />
<field name="texto_documento" type="text_trt18" indexed="true" stored="true" multiValued="true"/>
<field name="assunto" type="text_pt" indexed="true" stored="true" multiValued="true" termVectors="true"/>
<field name="parte" type="text_pt" indexed="true" stored="true" multiValued="true" termVectors="true"/>
<field name="link_andamentos" type="string" indexed="false" stored="true"/>
<field name="link_visualizar_documento" type="string" indexed="false" stored="true"/>
<field name="link_visualizar_acordao" type="string" indexed="false" stored="true"/>
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
<field name="text_rev" type="text_general_rev" indexed="true" stored="false" multiValued="true"/>
<uniqueKey>id</uniqueKey>
<copyField source="num_processo" dest="text"/>
<copyField source="ano_processo" dest="text"/>
<copyField source="sigla_classe_processual" dest="text"/>
<copyField source="orgao_julgador" dest="text"/>
<copyField source="texto_documento" dest="text"/>
<copyField source="assunto" dest="text"/>
<copyField source="parte" dest="text"/>
<fieldType name="text_trt18" class="solr.TextField" >
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.BrazilianStemFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.BrazilianStemFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0" positionIncrementGap="0"/>
<!-- boolean type: "true" or "false" -->
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="pint" class="solr.IntPointField" docValues="true"/>
<fieldType name="pfloat" class="solr.FloatPointField" docValues="true"/>
<fieldType name="plong" class="solr.LongPointField" docValues="true"/>
<fieldType name="pdouble" class="solr.DoublePointField" docValues="true"/>
<fieldType name="pints" class="solr.IntPointField" docValues="true" multiValued="true"/>
<fieldType name="pfloats" class="solr.FloatPointField" docValues="true" multiValued="true"/>
<fieldType name="plongs" class="solr.LongPointField" docValues="true" multiValued="true"/>
<fieldType name="pdoubles" class="solr.DoublePointField" docValues="true" multiValued="true"/>
<fieldType name="pdate" class="solr.DatePointField" docValues="true"/>
<fieldType name="pdates" class="solr.DatePointField" docValues="true" multiValued="true"/>
<fieldType name="binary" class="solr.BinaryField"/>
<fieldType name="random" class="solr.RandomSortField" indexed="true" />
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymGraphFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.FlattenGraphFilterFactory"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.FlattenGraphFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
possible with WordDelimiterGraphFilter in conjuncton with stemming. -->
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.FlattenGraphFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory"
pattern="([^a-z])" replacement="" replace="all"
/>
</analyzer>
</fieldType>
<fieldType name="phonetic" stored="false" indexed="true" class="solr.TextField" >
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
</analyzer>
</fieldType>
<fieldType name="payloads" stored="false" indexed="true" class="solr.TextField" >
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float"/>
</analyzer>
</fieldType>
<fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="descendent_path" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" />
</analyzer>
</fieldType>
<fieldType name="ancestor_path" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
</analyzer>
</fieldType>
<fieldType name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
<fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>
<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
geo="true" distErrPct="0.025" maxDistErr="0.001" distanceUnits="kilometers" />
<!-- Portuguese -->
<fieldType name="text_pt" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_pt.txt" format="snowball" />
<filter class="solr.PortugueseLightStemFilterFactory"/>
</analyzer>
</fieldType>
</schema>
Upvotes: 1
Views: 187
Reputation: 11
As reported by @abhijit-bashetti, the query field and highlight field should be the same and, in my case, I was querying at my default field (text) and trying to highlight the output on a specific field (assunto). So, to solve my problem, I've created an alternative solution. I replicated the general query into my specific field, so my q parameter is as like q=insalubridade AND assunto:insalubridade
. By doing that I could get the output highlighted.
Although, it has to be clear that in same cases the highlight output is blank, as below.
"highlighting":{
"pje2-8188452":{
"assunto":["Adicional de <mark>Insalubridade</mark>"]},
"pje2-10910741":{
"assunto":["Adicional de <mark>Insalubridade</mark>"]},
"saj-4815412":{
"assunto":["Adicional de <mark>Insalubridade</mark>"]},
"saj-4676226":{
"assunto":["Adicional de <mark>Insalubridade</mark>"]},
"saj-661600":{},
"pje1-24544513":{},
"pje2-7109330":{},
"pje1-6880206":{}}}
Upvotes: 0
Reputation: 8668
I used the sample data as below :
{
"id":"saj-4815412",
"assunto":"Adicional de Insalubridade"
},
{
"id":"pje1-14030983",
"assunto":"Diferenças por Desvio de Função"
},
{
"id":"saj-4676226",
"assunto":"Adicional de Insalubridade"
}
Here is the query I executed :
http://localhost:8983/solr/TestDemo3/select?fl=id_str,assunto&hl.fl=assunto&hl=on&q=assunto:insalubridade
Here is the output :
The second search url is :
http://localhost:8983/solr/TestDemo3/select?fl=id_str,%20assunto&hl.fl=assunto&hl=on&q=assunto:adicional
The output with highlighting is :
Upvotes: 1
Reputation: 697
you are using the default "original" highlighter? it has issues with language support as it doesn't provide breakiterator support. Maybe you can try with unified heighlighter once and see if it works otherwise I would need your schema to dig further
Upvotes: 0