Arthur_Kan
Arthur_Kan

Reputation: 1

output issues with NLTK CHILDES Corpus Reader in Python

I am working on a child language project and would like to use the CHILDES Corpus Reader package to analyze children's language data. However, the methods do not output anything. I am trying with the Valian Corpus in the XML version (the link for downloading the XML version of Valian corpus is [(https://childes.talkbank.org/data-xml/Eng-NA/)]

Here is the code I tried, the first 4 lines read the corpus and can output the XML file ids of each file. However, the codes using the .words(), .sents() and .MLU() methods generate no output.

~python

import nltk
from nltk.corpus.reader import CHILDESCorpusReader
valian = CHILDESCorpusReader('./Valian', '.\*.xml')
valian.fileids()

#print words. 
valian.words('./Valian/01a.xml')

#print sentences
valian.sents('./Valian/01a.xml')

#print MLU
valian.MLU('./Valian/01a.xml')

~

Here is the output, which is either a null list or 0. But I was expecting a list of words or a list of sentences.

~python

>>> valian.words('/01a.xml')   
[]

>>> valian.sents('/01a.xml') 
[]

>>> valian.MLU('/01a.xml') 
[0]. 

~

This is a bit odd as I was just trying to follow the NLTK documentation (https://www.nltk.org/howto/childes.html) Thank you very much for your help!

Upvotes: 0

Views: 106

Answers (1)

Peter Schoener
Peter Schoener

Reputation: 21

This is a bug in NLTK 3.6, 3.7. It should be resolved with the 3.8 release, but I also got around it by downgrading to 3.5.

GH tracking issue, PR, duplicate question

Upvotes: 0

Related Questions