Reputation: 1019
I currently use spaCy to traverse the dependency tree, and generate entities.
nlp = get_spacy_model(detect_lang(unicode_text))
doc = nlp(unicode_text)
entities = set()
for sentence in doc.sents:
# traverse tree picking up entities
for token in sentence.subtree:
## pick entitites using some pre-defined rules
entities.discard('')
return entities
Are there any good Java alternatives for spaCy?
I am looking for libs which generate the Dependency Tree as is done by spaCy.
EDIT:
I looked into Stanford Parser. However, it generated the following parse tree:
ROOT
|
NP
_______________|_________
| NP
| _________|___
| | PP
| | ________|___
NP NP | NP
____|__________ | | _______|____
DT JJ JJ NN NNS IN DT JJ NN
| | | | | | | | |
the quick brown fox jumps over the lazy dog
However, I am looking for a tree structure like spaCy does:
jumps_VBZ
__________________________|___________________
| | | | | over_IN
| | | | | |
| | | | | dog_NN
| | | | | _______|_______
The_DT quick_JJ brown_JJ fox_NN ._. the_DT lazy_JJ
Upvotes: 5
Views: 11743
Reputation: 168
I recently released spaCy4j which mimics Token container objects from spaCy and integrates with spaCy server or CoreNLP.
Once you have a running docker of spacy-server (very easy to set up) it's as easy as:
// Create a new spacy-server adapter with host and port matching a running instance of spacy-server.
SpaCyAdapter adapter = SpaCyServerAdapter.create("localhost", 8080);
// Create a new SpaCy object. It is thread safe and should be reused across our app
SpaCy spacy = SpaCy.create(adapter);
// Parse a doc
Doc doc = spacy.nlp("My head feels like a frisbee, twice its normal size.");
// Inspect tokens
for (Token token : doc.tokens()) {
System.out.printf("Token: %s, Tag: %s, Pos: %s, Dependency: %s%n",
token.text(), token.tag(), token.pos(), token.dependency());
}
Feel free to contact via github for any questions etc.
Upvotes: 2
Reputation: 681
Another solution to integrate with Java and other languages is by using Spacy REST API. For example https://github.com/jgontrum/spacy-api-docker provide a Dockerization of Spacy REST API.
Upvotes: 1
Reputation: 9
spacy can be run through java program.
The env should be created first from command prompt by executing the following commands
python3 -m venv env
source ./env/bin/activate
pip install -U spacy
python -m spacy download en
python -m spacy download de
create a bash file spacyt.sh with following commands,parallel to env folder
#!/bin/bash
python3 -m venv env
source ./env/bin/activate
python test1.py
place the spacy code as python script, test1.py
import spacy
print('This is a test script of spacy')
nlp=spacy.load("en_core_web_sm")
doc=nlp(u"This is a sentence")
print([(w.text, w.pos_) for w in doc])
// instead of print we can write to a file for further processing
In java program run the bash file
String cmd="./spacyt.sh";
try {
Process p = Runtime.getRuntime().exec(cmd);
p.waitFor();
System.out.println("cmdT executed!");
} catch (Exception e) {
e.printStackTrace();
}
Upvotes: -2
Reputation: 5759
You're looking for the Stanford Dependency Parser. Like most of the Stanford tools, this is also bundled with Stanford CoreNLP under the depparse
annotator. Other parsers include the Malt parser (a feature-based shift reduce parser) and Ryan McDonald's the MST parser (an accurate but slower maximum spanning tree parser).
Upvotes: 2