leoce
leoce

Reputation: 735

MySQL float column with empty value ERROR 1265 (01000): Data truncated for column

I'm trying to store the GDELT dataset in MySQL database (MySQL 8.0, RHEL 7) but it returned an ERROR 1265(01000) because one float column has empty values in it:

CREATE TABLE event (
    GlobalEventID INT NOT NULL,
    Day INT NOT NULL,
    MonthYear MEDIUMINT NOT NULL,
    Year SMALLINT NOT NULL,
    FractionDate FLOAT NOT NULL,
    Actor1Code TINYTEXT NULL,
    Actor1Name TINYTEXT NULL,
    Actor1CountryCode TINYTEXT NULL,
    Actor1KnownGroupCode TINYTEXT NULL,
    Actor1EthnicCode TINYTEXT NULL,
    Actor1Religion1Code TINYTEXT NULL,
    Actor1Religion2Code TINYTEXT NULL,
    Actor1Type1Code TINYTEXT NULL,
    Actor1Type2Code TINYTEXT NULL,
    Actor1Type3Code TINYTEXT NULL,
    Actor2Code TINYTEXT NULL,
    Actor2Name TINYTEXT NULL,
    Actor2CountryCode TINYTEXT NULL,
    Actor2KnownGroupCode TINYTEXT NULL,
    Actor2EthnicCode TINYTEXT NULL,
    Actor2Religion1Code TINYTEXT NULL,
    Actor2Religion2Code TINYTEXT NULL,
    Actor2Type1Code TINYTEXT NULL,
    Actor2Type2Code TINYTEXT NULL,
    Actor2Type3Code TINYTEXT NULL,
    IsRootEvent TINYINT NOT NULL,
    EventCode TINYTEXT NOT NULL,
    EventBaseCode TINYTEXT NOT NULL,
    EventRootCode TINYTEXT NOT NULL,
    QuadClass TINYINT NOT NULL,
    GoldsteinScale FLOAT NOT NULL,
    NumMentions INT NOT NULL,
    NumSources INT NOT NULL,
    NumArticles INT NOT NULL,
    AvgTone FLOAT NOT NULL,
    Actor1Geo_Type TINYINT NULL,
    Actor1Geo_Fullname TEXT NULL,
    Actor1Geo_CountryCode TINYTEXT NULL,
    Actor1Geo_ADM1Code TINYTEXT NULL,
    Actor1Geo_ADM2Code TINYTEXT NULL,
    Actor1Geo_Lat FLOAT NULL,
    Actor1Geo_Long FLOAT NULL,
    Actor1Geo_FeatureID TINYTEXT NULL,
    Actor2Geo_Type TINYINT NULL,
    Actor2Geo_Fullname TEXT NULL,
    Actor2Geo_CountryCode TINYTEXT NULL,
    Actor2Geo_ADM1Code TINYTEXT NULL,
    Actor2Geo_ADM2Code TINYTEXT NULL,
    Actor2Geo_Lat FLOAT NULL,
    Actor2Geo_Long FLOAT NULL,
    Actor2Geo_FeatureID TINYTEXT NULL,
    ActionGeo_Type TINYINT NULL,
    ActionGeo_Fullname TEXT NULL,
    ActionGeo_CountryCode TINYTEXT NULL,
    ActionGeo_ADM1Code TINYTEXT NULL,
    ActionGeo_ADM2Code TINYTEXT NULL,
    ActionGeo_Lat FLOAT NULL,
    ActionGeo_Long FLOAT NULL,
    ActionGeo_FeatureID TINYTEXT NULL,
    DATEADDED BIGINT NOT NULL,
    SOURCEURL TEXT NOT NULL,
    PRIMARY KEY ( GlobalEventID )
);
Query OK, 0 rows affected (0.01 sec)
LOAD DATA INFILE 'event.csv' 
    INTO TABLE event
    FIELDS TERMINATED BY '\t' 
    LINES TERMINATED BY '\n'
;
ERROR 1265 (01000): Data truncated for column 'Actor2Geo_Lat' at row 5

The raw data can be retrieved here. It looks like this:

4   Baghdad, Baghdad, Iraq  IZ  IZ07    36785   33.3386 44.3939 -3103581    4   Baghdad, Baghdad, Iraq
1   Russia  RS  RS      60  100 RS  1   Russia
1   Ukraine UP  UP      49  32  UP  1   Ukraine
4   Sydney, New South Wales, Australia  AS  AS02    154637  -33.8833    151.217 -1603135    4   Sydney, New South Wales, Australia
0                               3   Los Angeles, California, United States
4   Sydney, New South Wales, Australia  AS  AS02    154637  -33.8833    151.217 -1603135    4   Sydney, New South Wales, Australia

The fifth row has empty values and the columns before Actor2Geo_Lat can handle them.

What should I do to load the data without error? Thanks a lot!

Upvotes: 1

Views: 894

Answers (1)

antoine.lange
antoine.lange

Reputation: 951

LOAD DATA INFILE in fact doesn't handle empty entries. They must either have a value appropriate to the column's data type, or else the sequence \N to signify NULL.

See http://bugs.mysql.com/bug.php?id=64603

To fix this, you could substitute \N for empty entries with a sed command (or whatever equivalent text substitution tool if you use Windows).

Also you can read the fields into a local variable, and then sets the actual field value to NULL, if the local variable ends up containing an empty string. If they're all possibly empty, then you'd read them all into variables and have multiple SET statements, like this:

LOAD DATA INFILE 'event.csv'
INTO TABLE event
FIELDS TERMINATED BY '\t' 
LINES TERMINATED BY '\n'
(@GlobalEventID, @Day, @MonthYear, ...)
SET
GlobalEventID = NULLIF(@GlobalEventID,''),
Day = NULLIF(@Day,''),
MonthYear = NULLIF(@MonthYear,'')
...
;

Upvotes: 5

Related Questions