Reputation: 11
i want to use TreeTagger module to tag POS-information on the raw corpus.
As it seems to be faster to use GPU via Google Colab, I installed TreeTagger module, but Colab codes cannot locate TreeTagger directory.
The error type is like this: TreeTaggerError: Can't locate TreeTagger directory (and no TAGDIR specified)
Please tell me where I should uplaod the treetagger folder.
Upvotes: 1
Views: 970
Reputation: 581
You have to specify directory:
treetaggerwrapper.TreeTagger(TAGLANG='en', TAGDIR='treetagger/') # treetagger is the installation dir
Installation in Colab.
Follow the instructions on the website.
In one cell in Colab you have to put the following (for other (not English) languages put other link for parameter files):
%%bash
mkdir treetagger
cd treetagger
# Download the tagger package for your system (PC-Linux, Mac OS-X, ARM64, ARMHF, ARM-Android, PPC64le-Linux).
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/tree-tagger-linux-3.2.4.tar.gz
tar -xzvf tree-tagger-linux-3.2.4.tar.gz
# Download the tagging scripts into the same directory.
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/tagger-scripts.tar.gz
gunzip tagger-scripts.tar.gz
# Download the installation script install-tagger.sh.
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/install-tagger.sh
# Download the parameter files for the languages you want to process.
# list of all files (parameter files) https://cis.lmu.de/~schmid/tools/TreeTagger/#parfiles
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/english.par.gz
sh install-tagger.sh
cd ..
sudo pip install treetaggerwrapper
And in the other following cell you can check the installation:
>>> import pprint # For proper print of sequences.
>>> import treetaggerwrapper
>>> #1) build a TreeTagger wrapper:
>>> tagger = treetaggerwrapper.TreeTagger(TAGLANG='en', TAGDIR='treetagger/')
>>> #2) tag your text.
>>> tags = tagger.tag_text("This is a very short text to tag.")
>>> #3) use the tags list... (list of string output from TreeTagger).
>>> pprint.pprint(tags)
Upvotes: 3