Brett
Brett

Reputation: 1

RapidMiner: Classifying new examples without re-running the existing trained model

How can I run a classification for new examples against my trained model, without re-running the trained model again?

The trained model takes some time to process (1 hour), and I'd like to classify new observations without having to wait every time for the training data to be used create the model again.

I've never separated these two processes before, I always had them in the same process flow window, as I don't know to execute these processes independently.

Upvotes: 0

Views: 684

Answers (1)

maerch
maerch

Reputation: 2063

It is possible to store the trained model either in the repository (use the "Store" operator) or as a file (operator "Write model"). Usually you will use the "Store" operator and read the model from the repository with the "Retrieve" operator, e.g. in the same process or in any other process.

It can be that RapidMiner will complain that it doesn't get a Model object, but an IOObject, but the process will run anyway and after the meta data of the IOObejct propagate the message will vanish.

Here is an example:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="5.3.013" expanded="true" height="60" name="Retrieve Golf" width="90" x="45" y="75">
        <parameter key="repository_entry" value="//Samples/data/Golf"/>
      </operator>
      <operator activated="true" class="decision_tree" compatibility="5.3.013" expanded="true" height="76" name="Decision Tree" width="90" x="179" y="75"/>
      <operator activated="true" class="store" compatibility="5.3.013" expanded="true" height="60" name="Store" width="90" x="313" y="75">
        <parameter key="repository_entry" value="my_model"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="5.3.013" expanded="true" height="60" name="Retrieve" width="90" x="45" y="210">
        <parameter key="repository_entry" value="my_model"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="5.3.013" expanded="true" height="60" name="Retrieve Golf-Testset" width="90" x="45" y="300">
        <parameter key="repository_entry" value="//Samples/data/Golf-Testset"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="5.3.013" expanded="true" height="76" name="Apply Model" width="90" x="179" y="210">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Retrieve Golf" from_port="output" to_op="Decision Tree" to_port="training set"/>
      <connect from_op="Decision Tree" from_port="model" to_op="Store" to_port="input"/>
      <connect from_op="Retrieve" from_port="output" to_op="Apply Model" to_port="model"/>
      <connect from_op="Retrieve Golf-Testset" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Upvotes: 1

Related Questions