Reputation: 1
I am working on a child language project and would like to use the CHILDES Corpus Reader package to analyze children's language data. However, the methods do not output anything. I am trying with the Valian Corpus in the XML version (the link for downloading the XML version of Valian corpus is [(https://childes.talkbank.org/data-xml/Eng-NA/)]
Here is the code I tried, the first 4 lines read the corpus and can output the XML file ids of each file. However, the codes using the .words(), .sents() and .MLU() methods generate no output.
~python
import nltk
from nltk.corpus.reader import CHILDESCorpusReader
valian = CHILDESCorpusReader('./Valian', '.\*.xml')
valian.fileids()
#print words.
valian.words('./Valian/01a.xml')
#print sentences
valian.sents('./Valian/01a.xml')
#print MLU
valian.MLU('./Valian/01a.xml')
~
Here is the output, which is either a null list or 0. But I was expecting a list of words or a list of sentences.
~python
>>> valian.words('/01a.xml')
[]
>>> valian.sents('/01a.xml')
[]
>>> valian.MLU('/01a.xml')
[0].
~
This is a bit odd as I was just trying to follow the NLTK documentation (https://www.nltk.org/howto/childes.html) Thank you very much for your help!
Upvotes: 0
Views: 106
Reputation: 21
This is a bug in NLTK 3.6, 3.7. It should be resolved with the 3.8 release, but I also got around it by downgrading to 3.5.
GH tracking issue, PR, duplicate question
Upvotes: 0