forgodsakehold
forgodsakehold

Reputation: 904

MySQL join query taking far too long

The following query is taking forever in MySQL 5.7.23, on a macOS 10.13.6 on 2018 Macbook Pro (A1990) with 50GB of free space:

INSERT INTO `a_sg1lib` (`book_id`,`title`, `isbn`, `Zstatus_retrieve_TOC`, `Zstatus_retrieve_classifybyoclc`, 
`classifybyoclc_respcode`, `classifybyoclc_calln_lcc`, `classifybyoclc_calln_ddc`, `classifybyoclc_calln_nlm`, 
`classifybyoclc_fast`, `classifybyoclc_owi`,`reference`, `data`, `crcomment`, `groupname`, `code`, `indentation`, `path`, `specialtag`, 
`specialtag_cat`, `specialtag_master`, `specialtag_master_cat`, `specialtag_override_cat_moddetails`, 
`specialtag_override_cat_timestamp`, `specialtag_override_cat`, `pageno`, `specialtag_escalated`, `history`, 
`history_classification`, `specialtag_esc_start`, `specialtag_esc_start_type`, `callCore`, 
`d1`, `d2`, `d3`, `d4`, `d5`, `d6`, `d7`, `d8`, `d9`, `d10`, `d11`, `d12`) 
SELECT `RU_sg1lib_classifybyoclc`.`ID`,`Title`, `IdentifierWODash`, `Zstatus_retrieve_TOC`, `Zstatus_retrieve_classifybyoclc`, `classifybyoclc_respcode`, `classifybyoclc_calln_lcc`, 
    `classifybyoclc_calln_ddc`, `classifybyoclc_calln_nlm`, 
    `classifybyoclc_fast`, `classifybyoclc_owi`,`reference`, `data`,
    `crcomment`, `groupname`, `code`, `indentation`, `path`, `specialtag`, `specialtag_cat`, 
    `specialtag_master`, `specialtag_master_cat`, `specialtag_override_cat_moddetails`, 
    `specialtag_override_cat_timestamp`, `specialtag_override_cat`, `pageno`, `specialtag_escalated`, `history`, 
    `history_classification`, `specialtag_esc_start`, `specialtag_esc_start_type`, `callCore`, 
    `d1`, `d2`, `d3`, `d4`, `d5`, `d6`, `d7`, `d8`, `d9`, `d10`, `d11`, `d12` 
FROM `RU_sg1lib_classifybyoclc` 
RIGHT JOIN `loc_classification`.`LOC_Classification_Text_zFULL_YYY`
   ON `RU_sg1lib_classifybyoclc`.`classifybyoclc_calln_lcc`=
      `loc_classification`.`LOC_Classification_Text_zFULL_YYY`.`callCore`;

Number of rows:

RU_sg1lib_classifybyoclc - 1+ million rows - MyISAM

LOC_Classification_Text_zFULL_YYY - 440000+ rows - InnoDB

a_sg1lib - InnoDB, initially empty

Schema:

 CREATE TABLE `RU_sg1lib_classifybyoclc` (
  `ID` int(15) unsigned NOT NULL AUTO_INCREMENT,
  `Title` varchar(2000) DEFAULT '',
  `VolumeInfo` varchar(100) DEFAULT '',
  `Series` varchar(300) DEFAULT '',
  `Periodical` varchar(200) DEFAULT '',
  `Author` varchar(1000) DEFAULT '',
  `Year` varchar(14) DEFAULT '',
  `Edition` varchar(60) DEFAULT '',
  `Publisher` varchar(400) DEFAULT '',
  `City` varchar(100) DEFAULT '',
  `Pages` varchar(100) DEFAULT '',
  `PagesInFile` int(10) unsigned NOT NULL DEFAULT '0',
  `Language` varchar(150) DEFAULT '',
  `Topic` varchar(500) DEFAULT '',
  `Library` varchar(50) DEFAULT '',
  `Issue` varchar(100) DEFAULT '',
  `Identifier` varchar(300) DEFAULT '',
  `ISSN` varchar(9) DEFAULT '',
  `ASIN` varchar(200) DEFAULT '',
  `UDC` varchar(200) DEFAULT '',
  `LBC` varchar(200) DEFAULT '',
  `DDC` varchar(45) DEFAULT '',
  `LCC` varchar(45) DEFAULT '',
  `Doi` varchar(45) DEFAULT '',
  `Googlebookid` varchar(45) DEFAULT '',
  `OpenLibraryID` varchar(200) DEFAULT '',
  `Commentary` varchar(10000) DEFAULT '',
  `DPI` int(6) unsigned DEFAULT '0',
  `Color` varchar(1) DEFAULT '',
  `Cleaned` varchar(1) DEFAULT '',
  `Orientation` varchar(1) DEFAULT '',
  `Paginated` varchar(1) DEFAULT '',
  `Scanned` varchar(1) DEFAULT '',
  `Bookmarked` varchar(1) DEFAULT '',
  `Searchable` varchar(1) DEFAULT '',
  `Filesize` bigint(20) unsigned NOT NULL DEFAULT '0',
  `Extension` varchar(50) DEFAULT '',
  `MD5` char(32) DEFAULT '',
  `Generic` char(32) DEFAULT '',
  `Visible` char(3) DEFAULT '',
  `Locator` varchar(733) DEFAULT '',
  `Local` int(10) unsigned DEFAULT '0',
  `TimeAdded` timestamp NOT NULL DEFAULT '2000-01-01 13:00:00',
  `TimeLastModified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `Coverurl` varchar(200) DEFAULT '',
  `Tags` varchar(500) DEFAULT '',
  `IdentifierWODash` varchar(300) DEFAULT '',
  `Zstatus_retrieve_TOC` varchar(255) DEFAULT NULL,
  `Zstatus_retrieve_classifybyoclc` varchar(255) DEFAULT NULL,
  `classifybyoclc_respcode` text,
  `classifybyoclc_calln_lcc` varchar(256) DEFAULT NULL,
  `classifybyoclc_calln_ddc` text,
  `classifybyoclc_calln_nlm` text,
  `classifybyoclc_fast` text,
  `classifybyoclc_owi` text,
  `classifybyoclc_timest` text,
  `pid` text,
  `ida` int(11) DEFAULT NULL,
  PRIMARY KEY (`ID`),
  UNIQUE KEY `MD5` (`MD5`),
  KEY `Generic` (`Generic`) USING BTREE,
  KEY `VisibleTimeAdded` (`Visible`,`TimeAdded`) USING BTREE,
  KEY `TimeAdded` (`TimeAdded`) USING BTREE,
  KEY `Topic` (`Topic`(3)) USING BTREE,
  KEY `VisibleID` (`Visible`,`ID`) USING BTREE,
  KEY `VisibleTimeLastModified` (`Visible`,`TimeLastModified`,`ID`) USING BTREE,
  KEY `TimeLastModifiedID` (`TimeLastModified`,`ID`) USING BTREE,
  KEY `DOI_INDEX` (`Doi`) USING BTREE,
  KEY `Identifier` (`Identifier`),
  KEY `classifybyoclc_calln_lcc` (`classifybyoclc_calln_lcc`),
  KEY `classifybyoclc_fast` (`classifybyoclc_fast`(300)),
  FULLTEXT KEY `Title` (`Title`),
  FULLTEXT KEY `Author` (`Author`),
  FULLTEXT KEY `Language` (`Language`),
  FULLTEXT KEY `Extension` (`Extension`),
  FULLTEXT KEY `Publisher` (`Publisher`),
  FULLTEXT KEY `Series` (`Series`),
  FULLTEXT KEY `Year` (`Year`),
  FULLTEXT KEY `Title1` (`Title`,`Author`,`Series`,`Publisher`,`Year`,`Periodical`,`VolumeInfo`),
  FULLTEXT KEY `Tags` (`Tags`),
  FULLTEXT KEY `Identifierfulltext` (`IdentifierWODash`)
) ENGINE=InnoDB AUTO_INCREMENT=3246566 DEFAULT CHARSET=utf8 
    
CREATE TABLE `LOC_Classification_Text_zFULL_YYY` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `reference` longtext COLLATE utf8mb4_unicode_ci,
  `data` longtext COLLATE utf8mb4_unicode_ci,
  `crcomment` longtext COLLATE utf8mb4_unicode_ci,
  `groupname` longtext COLLATE utf8mb4_unicode_ci,
  `code` longtext COLLATE utf8mb4_unicode_ci,
  `indentation` int(10) unsigned DEFAULT NULL,
  `path` longtext COLLATE utf8mb4_unicode_ci,
  `specialtag` longtext COLLATE utf8mb4_unicode_ci,
  `specialtag_cat` mediumtext COLLATE utf8mb4_unicode_ci,
  `specialtag_master` mediumtext COLLATE utf8mb4_unicode_ci,
  `specialtag_master_cat` mediumtext COLLATE utf8mb4_unicode_ci,
  `specialtag_override_cat_moddetails` mediumtext COLLATE utf8mb4_unicode_ci,
  `specialtag_override_cat_timestamp` mediumtext COLLATE utf8mb4_unicode_ci,
  `specialtag_override_cat` mediumtext COLLATE utf8mb4_unicode_ci,
  `pageno` longtext COLLATE utf8mb4_unicode_ci,
  `specialtag_escalated` longtext COLLATE utf8mb4_unicode_ci,
  `history` longtext COLLATE utf8mb4_unicode_ci,
  `history_classification` longtext COLLATE utf8mb4_unicode_ci,
  `specialtag_esc_start` longtext COLLATE utf8mb4_unicode_ci,
  `specialtag_esc_start_type` longtext COLLATE utf8mb4_unicode_ci,
  `callCore` varchar(256) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `d1` longtext COLLATE utf8mb4_unicode_ci,
  `d2` longtext COLLATE utf8mb4_unicode_ci,
  `d3` longtext COLLATE utf8mb4_unicode_ci,
  `d4` longtext COLLATE utf8mb4_unicode_ci,
  `d5` longtext COLLATE utf8mb4_unicode_ci,
  `d6` longtext COLLATE utf8mb4_unicode_ci,
  `d7` longtext COLLATE utf8mb4_unicode_ci,
  `d8` longtext COLLATE utf8mb4_unicode_ci,
  `d9` longtext COLLATE utf8mb4_unicode_ci,
  `d10` longtext COLLATE utf8mb4_unicode_ci,
  `d11` longtext COLLATE utf8mb4_unicode_ci,
  `d12` longtext COLLATE utf8mb4_unicode_ci,
  PRIMARY KEY (`id`),
  KEY `indentation` (`indentation`),
  KEY `callCore` (`callCore`),
  FULLTEXT KEY `specialtag` (`specialtag`)
) ENGINE=InnoDB AUTO_INCREMENT=496218 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |
    
    | a_sg1lib | CREATE TABLE `a_sg1lib` (
  `ida` int(11) DEFAULT '0',
  `classifybyoclc_respcode` text CHARACTER SET utf8mb4,
  `classifybyoclc_timest` text CHARACTER SET utf8mb4,
  `isbn` text CHARACTER SET utf8mb4,
  `choseStatus` varchar(256) CHARACTER SET utf8mb4 DEFAULT NULL,
  `choseStatus2` varchar(1000) CHARACTER SET utf8mb4 DEFAULT NULL,
  `choseStatusSpecial` varchar(256) CHARACTER SET utf8mb4 DEFAULT NULL,
  `choseStatusMarker` varchar(256) CHARACTER SET utf8mb4 DEFAULT NULL,
  `title` text CHARACTER SET utf8mb4,
  `subtitle` text CHARACTER SET utf8mb4,
  `publisher_name` text CHARACTER SET utf8mb4,
  `imprint_name` text CHARACTER SET utf8mb4,
  `publication_date` text CHARACTER SET utf8mb4,
  `edition_number` text CHARACTER SET utf8mb4,
  `authors` text CHARACTER SET utf8mb4,
  `editors` text CHARACTER SET utf8mb4,
  `others` text CHARACTER SET utf8mb4,
  `contributors` text CHARACTER SET utf8mb4,
  `is_activated` text CHARACTER SET utf8mb4,
  `public_url` text CHARACTER SET utf8mb4,
  `date_added` text CHARACTER SET utf8mb4,
  `format` text CHARACTER SET utf8mb4,
  `rating` text CHARACTER SET utf8mb4,
  `engagement_score` text CHARACTER SET utf8mb4,
  `university_list_count` text CHARACTER SET utf8mb4,
  `published_list_count` text CHARACTER SET utf8mb4,
  `award_count` text CHARACTER SET utf8mb4,
  `mobile_disabled` text CHARACTER SET utf8mb4,
  `categories` text CHARACTER SET utf8mb4,
  `year` text CHARACTER SET utf8mb4,
  `subjects` text CHARACTER SET utf8mb4,
  `topics` text CHARACTER SET utf8mb4,
  `topics_facet_filter` text CHARACTER SET utf8mb4,
  `topics_detailed` text CHARACTER SET utf8mb4,
  `main_subject` text CHARACTER SET utf8mb4,
  `main_topic` text CHARACTER SET utf8mb4,
  `keywords` text CHARACTER SET utf8mb4,
  `description` text CHARACTER SET utf8mb4,
  `language_id` text CHARACTER SET utf8mb4,
  `language` varchar(256) CHARACTER SET utf8mb4 DEFAULT NULL,
  `sales_rights` text CHARACTER SET utf8mb4,
  `cover_image` text CHARACTER SET utf8mb4,
  `related_isbns` text CHARACTER SET utf8mb4,
  `file_size` text CHARACTER SET utf8mb4,
  `mobile_disabled_v3` text CHARACTER SET utf8mb4,
  `is_restricted` text CHARACTER SET utf8mb4,
  `organisation_list` text CHARACTER SET utf8mb4,
  `objectID` text CHARACTER SET utf8mb4,
  `_highlightResult` text CHARACTER SET utf8mb4,
  `chapters` text CHARACTER SET utf8mb4,
  `Zstatus_retrieve_TOC` text CHARACTER SET utf8mb4,
  `Zstatus_retrieve_classifybyoclc` text CHARACTER SET utf8mb4,
  `pid` text CHARACTER SET utf8mb4,
  `ida2` int(11) DEFAULT NULL,
  `withdrawal_date` text CHARACTER SET utf8mb4,
  `book_id` int(13) DEFAULT NULL,
  `classifybyoclc_calln_lcc` varchar(256) CHARACTER SET utf8mb4 DEFAULT NULL,
  `classifybyoclc_calln_ddc` text CHARACTER SET utf8mb4,
  `classifybyoclc_calln_nlm` text CHARACTER SET utf8mb4,
  `classifybyoclc_fast` varchar(960) CHARACTER SET utf8mb4 DEFAULT NULL,
  `classifybyoclc_owi` text CHARACTER SET utf8mb4,
  `classifybyoclc_fast_max` varchar(512) CHARACTER SET utf8mb4 DEFAULT NULL,
  `id` int(10) unsigned DEFAULT '0',
  `reference` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `data` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `crcomment` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `groupname` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `code` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `indentation` int(10) unsigned DEFAULT NULL,
  `path` varchar(1000) DEFAULT NULL,
  `specialtag` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `specialtag_cat` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `specialtag_master` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `specialtag_master_cat` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `specialtag_override_cat_moddetails` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `specialtag_override_cat_timestamp` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `specialtag_override_cat` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `pageno` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `specialtag_escalated` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `history` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `history_classification` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `specialtag_esc_start` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `specialtag_esc_start_type` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `callCore` varchar(256) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `d1` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `d2` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `d3` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `d4` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `d5` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `d6` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `d7` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `d8` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `d9` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `d10` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `d11` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `d12` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `idmaster` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`idmaster`),
  KEY `choseStatus` (`choseStatus`),
  KEY `classifybyoclc_calln_lcc` (`classifybyoclc_calln_lcc`),
  KEY `book_id` (`book_id`),
  KEY `path` (`path`),
  KEY `choseStatusSpecial` (`choseStatusSpecial`),
  KEY `choseStatusMarker` (`choseStatusMarker`),
  KEY `choseStatus2` (`choseStatus2`(256)) USING BTREE,
  FULLTEXT KEY `title` (`title`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

I thought what makes the query so slow is the fact that one of the tables is MyISAM, and tried to change that, but even that takes forever. Altering the columns involved in the join from TEXT to VARCHAR also takes forever ;(;(. Basically I am stuck ;(

May I know what is wrong, and how can I make this faster?

Upvotes: 0

Views: 203

Answers (3)

O. Jones
O. Jones

Reputation: 108641

What's going on here to make this slow? A few things.

  • You have big tables and no WHERE clause, so this statement handles a lot of data. It's never going to be sub-minute fast.

  • Your tables have many CLOB (mediumtext, longtext) columns. The way MySQL stores those columns makes retrieving and storing them surprisingly expensive. Your query insists on retrieving them all and storing them again.

  • You correctly identified the mix of MyISAM and InnoDB tables as a possible performance problem. But, your table definitions show InnoDB tables, so you may have successfully converted one of them. That's good.

  • You're joining on this:

    ON `RU_sg1lib_classifybyoclc`.`classifybyoclc_calln_lcc` 
     =  `loc_classification`.`LOC_Classification_Text_zFULL_YYY`.`callCore`;
    

    It's very important to performance that the columns in your ON-clauses have precisely the same definition. The declaration of classifybyoclc_calln_lcc in RU_sg1lib_classifybyoclc lacks both its own COLLATE clause and a default COLLATE for the table, so its collation is unknown. It is the database's default, which we don't know.

    varchar(256) CHARACTER SET utf8mb4 DEFAULT NULL 
    

    The declaration of callCore in LOC_Classification_Text_zFULL_YYY is this.

    varchar(256) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
    

    If your database default collation is also utf8mb4_unicode_ci you should be in good shape. Otherwise alter the first table to specify the collation for classifybyoclc_calln_lcc explicitly.

  • It's possible you could store those OCLC call numbers using CHARACTER SET latin1 COLLATE latin1_bin. They may not need to be case-insensitive and they may not contain characters outside the ASCII and western European character sets. If your call numbers work like that, you might alter your tables to use the more efficient collation. Latin-1 text takes a quarter as many bytes to store in indexes as utfmb4 text. If you need case-insensitive searching on the call numbers, use COLLATE latin1_general_ci.

  • RIGHT JOIN is just plain weird. Almost everybody uses LEFT JOIN, reversing the order of your two tables. Do you really need an outer join like that? If you don't need an outer join, use an ordinary INNER join.

What can you do about this? You didn't tell us why you need that a_sg1lib table. If you're using it with queries containing WHERE clauses, to find just a few rows at a time, you could create a VIEW rather than a TABLE. Each lookup will then use the indexes on your existing tables, and you won't have to copy the data.

Upvotes: 0

Wilson Hauck
Wilson Hauck

Reputation: 2343

forgodsakehold, We should talk if this is not clear. View my profile for contact info, please.

Observation 1,

RU_sg1lib_classifybyoclc.ID appears to be INT(15) unsigned 
SELECTED for push into a_sg11lib table as book_id

and

a_sg1lib.book_id appears to be simple INT(13)  (NOT unsigned and diff lengths)

most often we see same column attributes when moving data.

Observation 2, RIGHT JOIN of calln_lcc = to callCore

RU_sg1lib_classifbyoclc.classifybyoclc_calln_lcc is simple varchar(256)

and

LOC_Classification_Text_zFULL_YYY.callcore is varchar(256) COLLATE utf8mb4_unicode_ci 

most often we see matched column definitions for left and right side of =.

Upvotes: 0

Rick James
Rick James

Reputation: 142218

SHOW CREATE TABLE is more descriptive than DESCRIBE. In particular, I need to see if the indexes are composite and, if so, what order the columns are in.

RU_sg1lib_classifybyoclc:  INDEX(classifybyoclc_calln_lcc)

If it turns out that that is really a JOIN instead of a RIGHT JOIN, then this might be useful:

LOC_Classification_Text_zFULL_YYY:  INDEX(callCore)

I suspect that most of the columns are declared larger than necessary. Shrinking them may help some. For example, the 8-byte BIGINT is usually bigger than will ever be needed. year and some ids are unnecessarily (and inefficiently) TEXT.

Upvotes: 1

Related Questions