Bluetail
Bluetail

Reputation: 1291

How to load a tsv file for MALLET using FileInputStream in Java?

I want to load the flat text file passed in as 'TMFlatFile' (which is the .tsv file format to use in MALLET) into into the fileReader variable. I have created the method, RunTopicModelling() and am having a problem with the try/except block. I have created my File and FileInputStream objects, but dont know how to load it correctly into fileReader?

I have an error that "The method read(CharBuffer) in the type InputStreamReader is not applicable for the arguments (int)".

public class TopicModelling {
    
    private void StartTopicModellingProcess(String filePath) {
        JSONIOHelper jsonIO = new JSONIOHelper(); 
        jsonIO.LoadJSON(filePath); 
        ConcurrentHashMap<String, String> lemmas = jsonIO.GetDocumentsFromJSONStructure();
        
        
        SaveLemmaDataToFile("topicdata.txt" ,lemmas);
        
    }
    
    private void SaveLemmaDataToFile(String TMFlatFile, ConcurrentHashMap<String, String> lemmas) {

        for (Entry<String, String> entry : lemmas.entrySet()) {
            try (FileWriter writer = new FileWriter(TMFlatFile)) {
                ;
                writer.write(entry.getKey() + "\ten\t" + entry.getValue() + "\r\n");
            } catch (Exception e)

            {
                System.out.println("Saving to flat text file failed...");

            }
        }
    }

    private void RunTopicModelling(String TMFlatFile, int numTopics, int numThreads, int numIterations) {
         ArrayList<Pipe> pipeList = new ArrayList <Pipe>();

         // Pipes: tokenise, map to features
        pipeList.add(new CharSequence2TokenSequence (Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}")));
        pipeList.add(new TokenSequence2FeatureSequence());
        
        InstanceList instances = new InstanceList (new SerialPipes(pipeList)); 
        
        
        InputStreamReader fileReader = null;
        //loads the file passed in via the TMFlatFile variable into the fileReader variable - this block I have a problem with
        try {
        
            File inFile = new File(TMFlatFile);
            FileInputStream fis = new FileInputStream(inFile);
        

            int line;
            
            while ((line = fis.read()) != -1) {
                }
                fileReader.read(line);

                
            } 
        fis.close();
        }catch(

    Exception e)
    {
        System.out.println("File Load Failed");
        System.exit(1);

    }
\\      // linking data to the pipeline
        instances.addThruPipe(new CsvIterator(fileReader,Pattern.compile("^(\\S*)[\\s,]*(\\S*)[\\s,]*(.*)$"),3,2,1));
    
}

Can someone tell me what is the correct way to do this?

Upvotes: 0

Views: 90

Answers (1)

David Mimno
David Mimno

Reputation: 1901

It's hard to say what the immediate issue is because the code sample provided looks like it's missing important parts, and would not compile as written (for example Exception e) and regex without quotes).

The data import developers guide https://mimno.github.io/Mallet/import-devel has sample code that should be a good starting point.

Upvotes: 1

Related Questions