Reputation: 165
I am trying to build index using Indri UI. I created parameter files and stopword lists for building the index. When I click build index, the UI keeps building for long time and the index is never built.
UI hangs here,
Here is my input.txt file,
<DOC>
<DOCNO>
@switcheery
</DOCNO>
<TEXT>
Lol?"@elsidi01: "@switcheery: God bless that man that loves to see me happy......"#I"
</TEXT>
<DOCNO>
@Roseefly
</DOCNO>
<TEXT>
42% of Irish People have a Medical Card/Doctor Only Card. ##I have to admit we are a great little country #budget15 #healthcare
</TEXT>
<DOCNO>
@FammySaulkner
</DOCNO>
<TEXT>
@dthompsonRTS11 @Kirkpatrick_29 gosh dev you read my mind #I??crossfit
</TEXT>
<DOCNO>
@codesilence
</DOCNO>
<TEXT>
data mine the heart..for ?? #nsa #i
</TEXT>
<DOCNO>
@ulidovmj
</DOCNO>
<TEXT>
Now That's What I Call Club Hits 2014: http://t.co/kd2xE5GZhq #nowalbum #album #ukcharts #uscharts #trending #i... http://t.co/tGe9wH6M0e
</TEXT>
<DOCNO>
@ulidovmj
</DOCNO>
<TEXT>
Now That's What I Call Club Hits 2014: http://t.co/kd2xE5GZhq #nowalbum #album #ukcharts #uscharts #trending #i... http://t.co/BmMMpLHcVA
</TEXT>
<DOCNO>
@ulidovmj
</DOCNO>
<TEXT>
Now That's What I Call Club Hits 2014: http://t.co/kd2xE5GZhq #nowalbum #album #ukcharts #uscharts #trending #i... http://t.co/GyuzOVA68T
</TEXT>
<DOCNO>
@ulidovmj
</DOCNO>
<TEXT>
Now That's What I Call Club Hits 2014: http://t.co/kd2xE5GZhq #nowalbum #album #ukcharts #uscharts #trending #i... http://t.co/sCw5U1DXMy
</TEXT>
<DOCNO>
@ulidovmj
</DOCNO>
<TEXT>
Now That's What I Call Club Hits 2014: http://t.co/kd2xE5GZhq #nowalbum #album #ukcharts #uscharts #trending #i... http://t.co/JwhqJoSN1T
</TEXT>
<DOCNO>
@SandySchmitz3
</DOCNO>
<TEXT>
Having kids is the biggest leap of faith a person can make. 2 create new lives & hope they spread goodness throughout the world. #I WISH
</TEXT>
<DOCNO>
@my_15minutes
</DOCNO>
<TEXT>
wubba lubba dub dub means I'm in great pain, please help me by winning the #I'dbemortyfied contest on @TheMarySue
</TEXT>
<DOCNO>
@darren1966h
</DOCNO>
<TEXT>
I managed to finish the Cheshire welcomes you! assignment! Try it for yourself! http://t.co/NYCrn7DQTu #GameInsight #iPad #i...
</TEXT>
<DOCNO>
@GomitasYnutella
</DOCNO>
<TEXT>
Set de fotos: dee-lirious: #i regret every day of my life i didn’t love you http://t.co/Z48py9uOOC
</TEXT>
<DOCNO>
@PernelleBdt
</DOCNO>
<TEXT>
"Un seul être vous manque et tout est dépeuplé."
Ma plus belle étoile, mon plus beau souvenir.. 3 ans déjà.. #I #14102011 #memories ??
</TEXT>
<DOCNO>
@news8martha
</DOCNO>
<TEXT>
The 2.7 inches of rain that's fallen in La Crosse would translate to 27 inches of snow!
#I'll top complaining now!
</TEXT>
</DOC>
Here is my stopwords.txt,
<parameters>
<stopper>
<word>happy</word>
<word>wondeful</word>
<word>sad</word>
<word>cute</word>
</stopper>
</parameters>
Am I missing something? Please help me with this and I am new to IR.I have no idea about the parameter file. I created one and I am not sure where it is used.
Upvotes: 0
Views: 225
Reputation: 1316
What I did for stopword list, I simply write each word in each line without any tags. Also what I think is the correct way for TRECTEXT format is having each document in one tag of <DOC></DOC>
and then inside this tage put the </DOCNO>
and </TEXT>
tag. For example:
<DOC>
<DOCNO>
@switcheery
</DOCNO>
<TEXT>
Lol?"@elsidi01: "@switcheery: God bless that man that loves to see me happy......"#I"
</TEXT>
</DOC>
<DOCNO>
@Roseefly
</DOCNO>
<TEXT>
42% of Irish People have a Medical Card/Doctor Only Card. ##I have to admit we are a great little country #budget15 #healthcare
</TEXT>
</DOC>
Upvotes: 0