Iksen
Iksen

Reputation: 24

MariaDB : convert string to int when importing from CSV, while removing spaces in number

I have a "big" csv file (around 1GB of data, 3M lines) to import into a MariaDB table.

The thing is, almost every field of every line is considered as a string. So, I have to convert "1 337" (string) into 1337 (integer).

Here is the script used to import into the table :

LOAD DATA LOW_PRIORITY LOCAL
    INFILE 'data.txt'
    INTO TABLE `test`.`test_import`
    CHARACTER SET utf8
    FIELDS TERMINATED BY ';'
    OPTIONALLY ENCLOSED BY '"'
    ESCAPED BY '"'
    LINES TERMINATED BY '\r\n'
    (`id`,
        `data`,
        @NumberOne,
        @NumberTwo,
        @NumberThree,
        @NumberFour)
        SET `Number One` = REPLACE(@NumberOne, ' ', ''),
            `Number Two` = REPLACE(@NumberOne, ' ', ''),
            `Number Three` = REPLACE(@NumberOne, ' ', ''),
            `Number Four` = REPLACE(@NumberOne, ' ', '');

Using this script, there is no problem importing numbers below 999. But beginning with 1000 (written "1 000" in my csv), all I have is a warning (Truncated incorrect INTEGER value: '1 000') and the value 1 in my database.

The "funny" thing is, when I try with this :

SET `Number One` = REPLACE(@NumberOne, '1', 'k'),
                `Number Two` = REPLACE(@NumberOne, '1', 'k'),
                `Number Three` = REPLACE(@NumberOne, '1', 'k'),
                `Number Four` = REPLACE(@NumberOne, '1', 'k')

-> REPLACE() works, "1 000" become "k 000".

So, how can I use REPLACE() to remove spaces in numbers ? Or, how to make CAST()/CONVERT() working correctly on strings like "1 337" ?


Some more informations.

Here is a fresh test table :

CREATE OR REPLACE TABLE test_spaces_extr (
    `Identifier`   tinytext,
    `First name`   tinytext,
    `Last name`    tinytext,
    `Number One`   int unsigned,
    `Number Two`   int unsigned,
    `Number Three` int unsigned,
    `Number Four`  int unsigned,
    `Number Five`  int unsigned,
    `Number Six`   int unsigned,
    `Number Seven` int unsigned
);

Here is the script to import CSV :

LOAD DATA LOW_PRIORITY LOCAL
    INFILE 'some_data.txt'
    INTO TABLE `test`.`test_spaces_extr`
    CHARACTER SET utf8
    FIELDS TERMINATED BY ';'
    OPTIONALLY ENCLOSED BY '"'
    ESCAPED BY '"'
    LINES TERMINATED BY '\r\n'
    (`Identifier`,
        `First name`,
        `Last name`,
        @NumberOne,
        @NumberTwo,
        @NumberThree,
        @NumberFour,
        @NumberFive,
        @NumberSix,
        @NumberSeven)
        SET `Number One` = REPLACE(@NumberOne, ' ', ''),
            `Number Two` = REPLACE(@NumberTwo, ' ', ''),
            `Number Three` = REPLACE(@NumberThree, ' ', ''),
            `Number Four` = REPLACE(@NumberFour, ' ', ''),
            `Number Five` = REPLACE(@NumberFive, ' ', ''),
            `Number Six` = REPLACE(@NumberSix, ' ', ''),
            `Number Seven` = REPLACE(@NumberSeven, ' ', '');

Here is the full content of some_data.txt :

"3efa639b3a";"Censored";"Censored";"7 896";"3 468";"3 854";"5 000";"1 234";"9 654";"1 337"

(One line, yes.)

Here is the result :

"Identifier"    "First name"    "Last name" "Number One"    "Number Two"    "Number Three"  "Number Four"   "Number Five"   "Number Six"    "Number Seven"
"3efa639b3a"    "Censored"  "Censored"  "7896"  "3468"  "3854"  "5000"  "1234"  "9654"  "0"

In fact, the "Number" fields became integers here. All of them, but not the last one ("Number Seven" -> "0").

It's becoming weirder ...

Upvotes: 0

Views: 1799

Answers (1)

wchiquito
wchiquito

Reputation: 16551

I can't reproduce the problem:

$ mysql -u user -p --column-type-info
MariaDB [(none)]> SELECT VERSION();
Field   1:  `VERSION()`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  utf8_general_ci (33)
Length:     72
Max_length: 24
Decimals:   31
Flags:      NOT_NULL 


+-----------------+
| VERSION()       |
+-----------------+
| 10.0.31-MariaDB |
+-----------------+
1 row in set (0.00 sec)

MariaDB [(none)]> SELECT CAST(REPLACE('1 337', ' ', '') AS UNSIGNED);
Field   1:  `CAST(REPLACE('1 337', ' ', '') AS UNSIGNED)`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       LONGLONG
Collation:  binary (63)
Length:     5
Max_length: 4
Decimals:   0
Flags:      NOT_NULL UNSIGNED BINARY NUM 


+---------------------------------------------+
| CAST(REPLACE('1 337', ' ', '') AS UNSIGNED) |
+---------------------------------------------+
|                                        1337 |
+---------------------------------------------+
1 row in set (0.00 sec)

UPDATE

File: /path/to/data.csv

"3efa639b3a";"Censored";"Censored";"7 896";"3 468";"3 854";"5 000";"1 234";"9 654";"1 337"
MariaDB [_]> SELECT VERSION();
+-----------------+
| VERSION()       |
+-----------------+
| 10.0.31-MariaDB |
+-----------------+
1 row in set (0.00 sec)

MariaDB [_]> DROP TABLE IF EXISTS `test_spaces_extr`;
Query OK, 0 rows affected (0.07 sec)

MariaDB [_]> CREATE OR REPLACE TABLE `test_spaces_extr` (
    ->     `Identifier`   tinytext,
    ->     `First name`   tinytext,
    ->     `Last name`    tinytext,
    ->     `Number One`   int unsigned,
    ->     `Number Two`   int unsigned,
    ->     `Number Three` int unsigned,
    ->     `Number Four`  int unsigned,
    ->     `Number Five`  int unsigned,
    ->     `Number Six`   int unsigned,
    ->     `Number Seven` int unsigned
    -> );
Query OK, 0 rows affected (0.00 sec)

MariaDB [_]> LOAD DATA LOW_PRIORITY LOCAL INFILE '/path/to/data.csv'
    ->   INTO TABLE `test_spaces_extr`
    ->   CHARACTER SET utf8
    ->   FIELDS TERMINATED BY ';'
    ->   OPTIONALLY ENCLOSED BY '"'
    ->   ESCAPED BY '"'
    ->   LINES TERMINATED BY '\r\n'
    ->   (
    ->     `Identifier`,
    ->     `First name`,
    ->     `Last name`,
    ->     @`NumberOne`,
    ->     @`NumberTwo`,
    ->     @`NumberThree`,
    ->     @`NumberFour`,
    ->     @`NumberFive`,
    ->     @`NumberSix`,
    ->     @`NumberSeven`
    ->   )
    ->   SET
    ->   `Number One` = REPLACE(@`NumberOne`, ' ', ''),
    ->   `Number Two` = REPLACE(@`NumberTwo`, ' ', ''),
    ->   `Number Three` = REPLACE(@`NumberThree`, ' ', ''),
    ->   `Number Four` = REPLACE(@`NumberFour`, ' ', ''),
    ->   `Number Five` = REPLACE(@`NumberFive`, ' ', ''),
    ->   `Number Six` = REPLACE(@`NumberSix`, ' ', ''),
    ->   `Number Seven` = REPLACE(@`NumberSeven`, ' ', '');
Query OK, 1 row affected (0.00 sec)                  
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

MariaDB [_]> SELECT
    ->   `Identifier`,
    ->   `First name`,
    ->   `Last name`,
    ->   `Number One`,
    ->   `Number Two`,
    ->   `Number Three`,
    ->   `Number Four`,
    ->   `Number Five`,
    ->   `Number Six`,
    ->   `Number Seven`
    -> FROM
    ->   `test_spaces_extr`;
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
| Identifier | First name | Last name | Number One | Number Two | Number Three | Number Four | Number Five | Number Six | Number Seven |
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
| 3efa639b3a | Censored   | Censored  |       7896 |       3468 |         3854 |        5000 |        1234 |       9654 |         1337 |
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
1 row in set (0.00 sec)

Upvotes: 1

Related Questions