Reputation: 348
There is a data file which has \n\n
at the end of every line.
http://pan.baidu.com/s/1o6jq5q6
My system:win7+python3.3+R-3.0.3
In R
sessionInfo()
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
In python: chcp 936
I can read it in R.
read.table("test.pandas",sep=",",header=TRUE)
It is so simple.
and I can read it in python to get almost same output.
fr=open("g:\\test.pandas","r",encoding="gbk").read()
data=[x for x in fr.splitlines() if x.strip() !=""]
for id,char in enumerate(data):
print(str(id)+","+char)
When i read it in python module pandas,
import pandas as pd
pd.read_csv("test.pandas",sep=",",encoding="gbk")
I found two problems in the output:
1)how to make right alignment(the problem i have asked in other post)
how to set alignment in pandas in python with non-ANSI characters
2)there is a NaN line in every real data.
Can i improve my pandas code to get better display in console?
Upvotes: 3
Views: 6511
Reputation: 368
Your file when read with open('test.pandas', 'rb')
seems to contain '\r\r\n' as its line terminators. Python 3.3 does seem to convert this to '\n\n' while Python 2.7 converts it to '\r\n' when read with open('test.pandas', 'r', encoding='gbk')
.
pandas.read_csv does have a lineterminator parameter but it only accepts single character terminators.
What you can do is process the file a bit before passing it to pandas.read_csv()
, and you can use StringIO which will wrap a string buffer in a file interface so that you don't need to write out a temporary file first.
import pandas as pd
from io import StringIO
with open('test.pandas', 'r', encoding='gbk') as in_file:
contents = in_file.read().replace('\n\n', '\n')
df = pd.read_csv(StringIO(contents))
(I don't have the GBK charset for the output below.)
>>> df[0:10]
??????? ??? ????????
0 HuangTianhui ?? 1948/05/28
1 ?????? ? 1952/03/27
2 ??? ? 1994/12/09
3 LuiChing ? 1969/08/02
4 ???? ?? 1982/03/01
5 ???? ?? 1983/08/03
6 YangJiabao ? 1988/08/25
7 ?????????????? ?? 1979/07/10
8 ?????? ? 1949/10/20
9 ???»? ? 1951/10/21
In Python 2.7 StringIO()
was in module StringIO
instead of io
.
Upvotes: 2