Reputation: 1506
Say I have the following file test.txt
:
Aaa Bbb
Foo 0
Bar 1
Baz NULL
(The separator is actually a tab character, which I can't seem to input here.) And I try to read it using pandas (0.10.0):
In [523]: pd.read_table("test.txt")
Out[523]:
Aaa Bbb
0 Foo NaN
1 Bar 1
2 Baz NaN
Note that the zero value in the first column has suddenly turned into NaN! I was expecting a DataFrame like this:
Aaa Bbb
0 Foo 0
1 Bar 1
2 Baz NaN
What do I need to change to obtain the latter? I suppose I could use pd.read_table("test.txt", na_filter=False)
and subsequently replace 'NULL' values with NaN and change the column dtype. Is there a more straightforward solution?
Upvotes: 2
Views: 1968
Reputation: 353429
I think this is issue #2599, "read_csv treats zeroes as nan if column contains any nan", which is now closed. I can't reproduce in my development version:
In [27]: with open("test.txt") as fp:
....: for line in fp:
....: print repr(line)
....:
'Aaa\tBbb\n'
'Foo\t0\n'
'Bar\t1\n'
'Baz\tNULL\n'
In [28]: pd.read_table("test.txt")
Out[28]:
Aaa Bbb
0 Foo 0
1 Bar 1
2 Baz NaN
In [29]: pd.__version__
Out[29]: '0.10.1.dev-f7f7e13'
Upvotes: 2
Reputation: 12765
Try:
import pandas as pd
df = pd.read_table("14256839_input.txt", sep=" ", na_values="NULL")
print df
print df.dtypes
This gives me
Aaa Bbb
0 Foo 0
1 Bar 1
2 Baz NaN
Aaa object
Bbb float64
Upvotes: 0