Reputation: 111
I have a question regarding Python concordance command in NLTK. First, I came through an easy example:
from nltk.book import *
text1.concordance("monstrous")
which worked just fine. Now, I have my own .txt file and I would like to perform the same command. I have a list called "textList" and want to find the word "CNA" so I put command
textList.concordance('CNA')
Yet, I got the error
AttributeError: 'list' object has no attribute 'concordance'.
In the example, is the text1 NOT a list? I wonder what is going on here.
Upvotes: 10
Views: 29771
Reputation: 139
In a Jupyter notebook (or a Google Colab notebook), the full process: MS Word file --> text file --> an NLTK object:
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.text import Text
import docx2txt
myTextFile = docx2txt.process("/mypath/myWordFile")
tokens = word_tokenize(myTextFile)
print(tokens)
textList = Text(tokens)
textList.concordance('contract')
Upvotes: 2
Reputation: 516
I got it woking with this code:
import sys
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.text import Text
def main():
if not sys.argv[1]:
return
# read text
text = open(sys.argv[1], "r").read()
tokens = word_tokenize(text)
textList = Text(tokens)
textList.concordance('is')
print(tokens)
if __name__ == '__main__':
main()
based on this site
Upvotes: 5
Reputation: 43324
.concordance()
is a special nltk function. So you can't just call it on any python object (like your list).
More specifically: .concordance()
is a method in the Text
class of nltk
Basically, if you want to use the .concordance()
, you have to instantiate a Text object first, and then call it on that object.
A Text is typically initialized from a given document or corpus. E.g.:
import nltk.corpus from nltk.text import Text moby = Text(nltk.corpus.gutenberg.words('melville-moby_dick.txt'))
concordance(word, width=79, lines=25)
Print a concordance for word with the specified context window. Word matching is not case-sensitive.
So I imagine something like this would work (not tested)
import nltk.corpus
from nltk.text import Text
textList = Text(nltk.corpus.gutenberg.words('YOUR FILE NAME HERE.txt'))
textList.concordance('CNA')
Upvotes: 33