Reputation: 1084
When I import the following .sql file (4 records inserted)
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
CREATE TABLE IF NOT EXISTS `sentences` (
`jp` text character set utf8 collate utf8_unicode_ci,
`eng` text character set utf8 collate utf8_unicode_ci,
`reading` text character set utf8 collate utf8_unicode_ci,
`query` varchar(50) character set utf8 collate utf8_unicode_ci default NULL,
`patternIDs` varchar(100) character set utf8 collate utf8_unicode_ci default NULL,
`hasImage` tinyint(1) NOT NULL,
`imageURL` varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL,
`id` int(11) NOT NULL auto_increment,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=eucjpms;
INSERT INTO `sentences` (`jp`, `eng`, `reading`, `query`, `patternIDs`, `hasImage`, `imageURL`, `id`) VALUES
('ムーリエルは20歳になりました。', 'Muiriel is 20 now.', 'はにぜろさいになりました。', 'ムーリエル', '64', 0, 'none', 1),
('すぐに戻ります。', 'I will be back soon.', 'すぐにもどります。', 'すぐ', '4', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=959017328936&id=b33b9daf539756a8b0b2364f63088008', 2),
('すぐに諦めて昼寝をするかも知れない。', 'I may give up soon and just nap instead.', 'すぐにあきらめてひるねをするかもしれない。', '昼寝', '19', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=888895375610&id=5debb6afed90989674d447f9493b4a1d', 3),
('ログアウトするんじゃなかったよ。', 'I shouldn\'t have logged off.', 'ログアウトするんじゃなかったよ。', 'ログアウト', '16', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=846535990996&id=4e0ad521154e2e7456330af87b24ee71', 4);
and then browse the sentences table, all the japanese sentences can be viewed w/o any problems in UTF-8 encoding. However, when I import the following file (exactly the same thing, the only difference in size, ~73000 records inserted, not 4)
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
CREATE TABLE IF NOT EXISTS `sentences` (
`jp` text character set utf8 collate utf8_unicode_ci,
`eng` text character set utf8 collate utf8_unicode_ci,
`reading` text character set utf8 collate utf8_unicode_ci,
`query` varchar(50) character set utf8 collate utf8_unicode_ci default NULL,
`patternIDs` varchar(100) character set utf8 collate utf8_unicode_ci default NULL,
`hasImage` tinyint(1) NOT NULL,
`imageURL` varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL,
`id` int(11) NOT NULL auto_increment,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=eucjpms;
INSERT INTO `sentences` (`jp`, `eng`, `reading`, `query`, `patternIDs`, `hasImage`, `imageURL`, `id`) VALUES
('ムーリエルは20歳になりました。', 'Muiriel is 20 now.', 'はにぜろさいになりました。', 'ムーリエル', '64', 0, 'none', 1),
('すぐに戻ります。', 'I will be back soon.', 'すぐにもどります。', 'すぐ', '4', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=959017328936&id=b33b9daf539756a8b0b2364f63088008', 2),
('すぐに諦めて昼寝をするかも知れない。', 'I may give up soon and just nap instead.', 'すぐにあきらめてひるねをするかもしれない。', '昼寝', '19', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=888895375610&id=5debb6afed90989674d447f9493b4a1d', 3),
('ログアウトするんじゃなかったよ。', 'I shouldn\'t have logged off.', 'ログアウトするんじゃなかったよ。', 'ログアウト', '16', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=846535990996&id=4e0ad521154e2e7456330af87b24ee71', 4)
('先生に質問したら、すぐに答えてくれました。', 'When I asked a question to my teacher, he/she immediately answered it.', 'せんせいにしつもんしたら、すぐにこたえてくれました。', '先生', '64, 189', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=889488746606&id=53a411907232964b30b9ebde03093a66', 73660),
('薬を飲んだら、すぐになおりました。', 'I took a medicine, and soon recovered.', 'くすりをのんだら、すぐになおりました。', '薬', '19, 64, 189', 1, 'http://ts2.mm.bing.net/images/thumbnail.aspx?q=934254550695&id=4400863ae021a4827dd7f9f7380fc2a2', 73661);
I can't see Japanese characters. Why is that? Why does phpMyAdmin have encoding problems when importing bigger .sql files? Thanks, guys!
Upvotes: 2
Views: 1443
Reputation: 1027
There are a lot of difficulties that can arise from the language and encoding being used. There is an invaluable source of info at http://www.herongyang.com/PHP-Chinese/ specifically for Chinese issues, and many of the discussions would also apply to any Unicode including Japanese.
For example, Heron Yang gives a possible flow:
H1. Key Sequences -> from keyboard (Text editor) ->
H2. HTML Document -> (Web server) ->
H3. HTTP Response -> (Internet TCP/IP Connection) ->
H4. HTTP Response -> (Web browser) ->
H5. Visual characters on the screen
Basically you need to make sure that every step in the import process (and output process) that there are no problems. The first step is the "garbled data on the phpmyadmin wiki" pointed out by Plebsori. Unfortunately, that wiki illustrates some problems but I think not the solutions.
I'd start with checking the encoding of the two .sql files are exactly the same. To test you could edit the 73,000 entry file using Notepad++ and delete all but the first four rows. Some text editors might change the encoding during the save process making the encoding of the two files different, even if they look exactly the same. So make sure you save both files exactly the same way. For Chinese, I would often use Notepad++ to change the encoding of the file. Make sure the .sql files are saved with the same encoding. Encoding is so important, that's why notepad++ has it as one of the menus on the menu bar.
Another issue that can arise with files is the BOM marker at the start of the text stream. http://en.wikipedia.org/wiki/Byte-order_mark. This invisible mark is what PHPmyAdmin might use to determine the conversion language. Again I'd use notepad++ to guarantee that the BOM is present. (Encoding menu). You can also use the because copy/paste might change the encoding (TextFX > TextFX Viz Settings > Viz Copy-Cut also in unicode).
Finally, there are still a lot of links in the chain. The good thing is that once you figure out how to get the data in and out properly while preserving the language, then it can be quite straightforward to do it again later. By the way, if you try the encoding tip I suggested and have verified that file formats are not the source of the problem, then there are some tricks to import data. You can convert the UTF8 to ascii (will look like garbage characters), import it, and then convert it back to the encoding you want inside sql.
Upvotes: 2
Reputation: 53607
You hit the server time limit/size limit, and phpmyadmin is smart enough (or not) to continue from where it stopped (approx'), since the encoding command is at the start of the file, when the second connection starts, it does not have any encoding settings.
Solution: either put the encoding command every few hundred lines or use file import
file import in Ubunto:
sudo mysql -u [user name] -p [database name] < [sql file name]
[ubunto root password]
[mysql password]
Upvotes: 0
Reputation: 1085
Here's a few suggestions that may help.
I'd suggest that you confirm you're able to post a 23 meg file to server. The PHP config file has a limit setting for the size of a post.
I'd also suggest that you confirm the php max execution time isn't being hit and causing the import to finish early.
Maybe you could import the SQL file from the commandline
mysql -u {username} -p{password} -h {serverHost} {databaseName} < {fileName}.sql
Upvotes: 2