fstab
fstab

Reputation: 5029

Rapid miner: CSV with real numbers with commas instead of dots

I have a problem importing a CSV file with RapidMiner. Floating point values are written with commas instead of the separating dot between the integer and decimal values.

Anyone know how to import correctly the values formatted in this way?

sample data:

BMI;1;0;1;1;1;blue;-0,138812155;0,520378909;5;0;50;107;0;9;0;other;good;2011 BMI;1;0;1;1;1;pink;-0,624654696;;8;0;73;120;1;3;0,882638889;other;good;2011

Rapid miner actually interprets it as "polynomial". Forcing it to "real" leads only to a correct interpretation of the "0" value.

thanks

Upvotes: 0

Views: 2748

Answers (3)

Bala Deshpande
Bala Deshpande

Reputation: 165

This seems to be a very old request. Not sure if this will help you, but this may help others with a similar situation.

Step 1: in the "Read CSV" operator, under "import configuration wizard", make sure you select "Semicolon" as the separator

Step 2: use the "Guess Types" operator. Attribute Filter Type -> Subset, Select Attributes -> select the attributes 8, 9 and 16 (based on your example above), change "decimal point character" to a "," and you should be all set.

Hope this helps (someone!)

Upvotes: 3

700 Software
700 Software

Reputation: 87783

public static void main(String args){
    BufferedReader br = new BufferedReader(new FileReader("c:\\path\\semicolons and numbers and commas.csv"));
    try {
        for(String line; (line=br.readLine()) != null);) {
            //Variable line now has a single line from the file. This code will execute for each line.
            String array = line.split(";");// Split on the semicolon. Beware of changing this. This uses regex which means that some characters mean something like . means anything, not just dots.
            double firstDouble = Double.parseDouble(array[7].replace(',','.')); // Get field 7 (the eighth field) and turn it into a double (high precision floating point). Replace , with . so it will not make an error
            System.err.println("Have a number " + firstDouble);
            System.err.println("Can play with it " + (firstDouble * 2.0));
        }
    }finally{
        br.close(); // Free resources (and unlock file on Windows).
    }
}

Upvotes: 0

JustinKSU
JustinKSU

Reputation: 4989

Use semi-colon as the delimiter. You can use java.util.Scanner to read each line. String.split() to split on the semi-colon. When you get a token with a comma you can use String.replace() to change the comma to a decimal. Then you can use Float.parseFloat()

Hope this answers you question.

Upvotes: 0

Related Questions