yamini harinathan
yamini harinathan

Reputation: 61

How to read a paragraph in natural language processing GATE

I am using GATE tool for natural language processing.. i am using java code to read lines from the sentence and get the keywords.. what modification has to be done in creole xml to read complete paragraph..

Upvotes: 1

Views: 1498

Answers (3)

Hari Gudigundla
Hari Gudigundla

Reputation: 822

This worked for me:

  1. Initialize gate
  2. Create gate controller (defaults with ANNIE)
  3. Create corpus, set corpus to controller, create gate document (gateDoc), add to corpus
  4. controller.execute();
  5. following code

            FeatureMap features = gateDoc.getFeatures();
            String originalContent = (String)
         features.get(GateConstants.ORIGINAL_DOCUMENT_CONTENT_FEATURE_NAME);
            int length = originalContent.length();
    
            TextualDocumentFormat tdf = new TextualDocumentFormat();
                try {
                    tdf.annotateParagraphs(gateDoc,0, length,null);
    
    
                } catch (DocumentFormatException e) {
                    e.printStackTrace();
                }
    
            AnnotationSet paragraphs = gateDoc.getAnnotations().get("paragraph");
    
            Iterator it = paragraphs.iterator();
            Annotation currAnnot;
            SortedAnnotationList sortedParagraphs = new SortedAnnotationList();
    
            while (it.hasNext()) {
                currAnnot = (Annotation) it.next();
                sortedParagraphs.addSortedExclusive(currAnnot);
            } // while
    
            StringBuffer editableContent = new StringBuffer(originalContent);
    
    
                System.out.println("Number of Paragraphs -  "+paragraphs.size());
    
                for(Annotation paragraph:paragraphs){
                    long start = paragraph.getStartNode().getOffset().longValue();
                    long end = paragraph.getEndNode().getOffset().longValue();
    
                    String paraText=editableContent.substring((int) start, (int) end);
                    System.out.println(paraText);
                }
    

Upvotes: 0

user1219801
user1219801

Reputation: 169

You can use

doc.getNamedAnnotationSets().get("Original markups")

If it doesn't give any results, you can use the method annotateParagraphs() of the class gate.corpora.TextualDocumentFormat.

Upvotes: 2

Mohammed Joraid
Mohammed Joraid

Reputation: 6480

I am not sure what do u mean, but if you use ANNIE you can put each paragraph in a separate tag. I used standAloneAnnie.java

http://gate.ac.uk/wiki/code-repository/src/sheffield/examples/StandAloneAnnie.java

If user enters

What is your name, ,some text sometext

Sometext sometext sometext

The result will be

<paragraph>What is your name, ,some text sometext</paragraph>

<paragraph>Sometext sometext sometext</paragraph>

You cane get more tags like, Person, Location, Sentence or Token for each word.

If user enters for example

Where To Dine In Kuala Lumpur. Helton Hotel

The result will be an xml file that contains

<paragraph>
        <Sentence>
        <Token>Where</Token>
        <Token>To</Token>
        <Token>
        <Unknown>Dine</Unknown>
        </Token>
        <Token>In</Token>
        <Lookup>
        <Location>
        <Token>Kuala</Token>
        <Token>
        <Lookup>Lumpur</Lookup>
        </Token>
        </Location>
        </Lookup>
        <Token>
        <Split>.</Split>
        </Token>
        </Sentence>

        <Sentence>
        <Organization>
        <Token>Helton</Token>
        <Token>
        <Lookup>
        <Lookup>Hotel</Lookup>
        </Lookup>
        </Token>
        </Organization>
        </Sentence>

     </paragraph>

I am currently trying to get synonyms but unable to do so :( I want the result to include other options like for the above sentence, i want to result to have Dine -> Dinner, Food, Eat, Restaurant.

Upvotes: 0

Related Questions