Reputation: 23
The csv file I have messy code which is supposed to be chinese characters. I want to read the file into python with the chinese characters not messy as before. How do I do that? I tried pandas.read_csv with encoding like gb2312 or gb18030, they all report error like UnicodeDecodeError: 'gb2312' codec can't decode byte 0xae in position 4: illegal multibyte sequence
CODE NAME LISTDATE FOUNDDATE TIME DATE EPTTM INDUSTRY LISTCITY 000001.SZ Âπ≥ÂÆâÈì∂Ë°å 3/4/1991 19871222 8 1/1/2007 0.030477768 Ω»⁄∑˛ŒÒ …Ó€⁄ 000002.SZ ‰∏áÁßëA 29/1/1991 19840530 8 1/1/2007 0.025771537 ∑øµÿ≤˙ …Ó€⁄ 000004.SZ ÂõΩÂÜúÁßëÊäÄ 14/1/1991 19860505 8 1/1/2007 -0.05297144 “Ω“©…˙ŒÔ …Ó€⁄ 000005.SZ ‰∏ñÁ∫™ÊòüÊ∫ê 10/12/1990 19870730 8 1/1/2007 -0.024968897 ∑øµÿ≤˙ …Ó€⁄ 000006.SZ Ê∑±Êå؉∏öA 27/4/1992 19850525 8 1/1/2007 0.074647402 ∑øµÿ≤˙ …Ó€⁄ 000007.SZ ÂÖ®Êñ∞•Ω,13/4/1992 19830311 NA 8 1/1/2007 NA ∑øµÿ≤˙ …Ó€⁄ 000008.SZ Á•ûÂ∑ûÈ´òÈìÅ 7/5/1992 19891011 8 1/1/2007 -0.010574387 ◊€∫œ …Ó€⁄ 000009.SZ ‰∏≠ÂõΩÂÆùÂÆâ 25/6/1991 19830706 8 1/1/2007 0.009576133 ∑øµÿ≤˙ …Ó€⁄
Upvotes: 1
Views: 3178
Reputation: 1152
data06_16 = pd.read_csv("yourfile.csv", encoding="GBK")
Try adding encoding equals to GBK, it work well.
as the screenshot.
Upvotes: 1