RafG
RafG

Reputation: 1506

Python pandas read_table converts zero to NaN

Say I have the following file test.txt:

Aaa Bbb
Foo 0
Bar 1
Baz NULL

(The separator is actually a tab character, which I can't seem to input here.) And I try to read it using pandas (0.10.0):

In [523]: pd.read_table("test.txt")
Out[523]:
   Aaa  Bbb
0  Foo  NaN
1  Bar    1
2  Baz  NaN

Note that the zero value in the first column has suddenly turned into NaN! I was expecting a DataFrame like this:

   Aaa   Bbb
0  Foo     0
1  Bar     1
2  Baz   NaN

What do I need to change to obtain the latter? I suppose I could use pd.read_table("test.txt", na_filter=False) and subsequently replace 'NULL' values with NaN and change the column dtype. Is there a more straightforward solution?

Upvotes: 2

Views: 1968

Answers (2)

DSM
DSM

Reputation: 353429

I think this is issue #2599, "read_csv treats zeroes as nan if column contains any nan", which is now closed. I can't reproduce in my development version:

In [27]: with open("test.txt") as fp:
   ....:     for line in fp:
   ....:         print repr(line)
   ....:         
'Aaa\tBbb\n'
'Foo\t0\n'
'Bar\t1\n'
'Baz\tNULL\n'

In [28]: pd.read_table("test.txt")
Out[28]: 
   Aaa  Bbb
0  Foo    0
1  Bar    1
2  Baz  NaN

In [29]: pd.__version__
Out[29]: '0.10.1.dev-f7f7e13'

Upvotes: 2

Thorsten Kranz
Thorsten Kranz

Reputation: 12765

Try:

import pandas as pd
df = pd.read_table("14256839_input.txt", sep=" ", na_values="NULL")
print df
print df.dtypes

This gives me

   Aaa  Bbb
0  Foo    0
1  Bar    1
2  Baz  NaN
Aaa     object
Bbb    float64

Upvotes: 0

Related Questions