Reputation: 25
My Problem is that if I have a missing value in a row, I want to replace this value with another one from that row. For example I want to replace the missing value with the appropriate "Belegnummer"
Upvotes: 1
Views: 1018
Reputation: 792
in general there is a operator called Replace Missing Values which does exactly what the name suggests.
In your special case you want to access the values of another attribute (column), so the Generate Attributes operator offers a very powerful expression builder where you can declare an If-statement of that form if(a1==MISSING_NUMERIC, a2,a1)
See the screenshot above for an example or copy&paste the process XML into your RapidMiner process window.
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.000-BETA">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.000-BETA" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="9.0.000-BETA" expanded="true" height="82" name="Subprocess" width="90" x="112" y="34">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.0.000-BETA" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="declare_missing_value" compatibility="9.0.000-BETA" expanded="true" height="82" name="Declare Missing Value" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="a1"/>
<parameter key="mode" value="expression"/>
<parameter key="expression_value" value="a1 <5"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="9.0.000-BETA" expanded="true" height="82" name="Select Attributes" width="90" x="380" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="a2|a1"/>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Declare Missing Value" to_port="example set input"/>
<connect from_op="Declare Missing Value" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="generate_attributes" compatibility="9.0.000-BETA" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34">
<list key="function_descriptions">
<parameter key="a1_new" value="if(a1==MISSING_NUMERIC, a2,a1)"/>
</list>
</operator>
<connect from_op="Subprocess" from_port="out 1" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<description align="center" color="yellow" colored="false" height="181" resized="true" width="529" x="275" y="126">With the expression parser more complex statements can be defined. In this case:<br>if(a1==MISSING_NUMERIC, a2,a1)<br/><br/>meaning that if the value of attribute a1 is missing, it will be replaced by the value of a2 otherwise the value of a1 is kept.<br/><br/>Instead of creating a new attribute the old one can also be overwritten<br/><br></description>
</process>
</operator>
</process>
Upvotes: 1