Reputation: 74705
Consider the following sequence of operations:
int64
, float64
object
Illustrated below:
# Step 1
df = pd.DataFrame.from_dict({'a': [3, 2, 1, 0], 'b': [1, 500.43, 256.13, 5]})
# Step 2
df2 = df.astype(object)
# Step 3
df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 4 non-null object
1 b 4 non-null object
dtypes: object(2)
memory usage: 192.0+ bytes
# NOTE notice how column `b` is rendered
df2
a b
0 3 1
1 2 500.43
2 1 256.13
3 0 5
# Step 4
df2.to_csv("/tmp/df2", index=False, sep="\t")
Now let us inspect the generated output:
$ cat df2
a b
3 1.0
2 500.43
1 256.13
0 5.0
Notice how column b
is persisted: the decimal places are still present for round numbers even though the datatype is object
. Why does this happen? What am I missing here?
I'm using Pandas 1.1.2 with Python 3.7.9.
Upvotes: 2
Views: 800
Reputation: 81
I think, 'object' is NumPy/pandas dtype and not one of the python data types. If you run:
type(df2.iloc[0,1])
before step 4, you will get 'float' data type even though it's been already changed to 'object'.
You can use:
df.to_csv("df.csv",float_format='%g', index=False, sep="\t")
instead of casting in step 2.
Upvotes: 1
Reputation: 8508
I am not great with pandas and still learning. I looked at a few solution and thought why not do an apply on the data before we send it to csv file.
Here's what I did to get the values printed as 1
and 5
instead of 1.0
and 5.0
import pandas as pd
df = pd.DataFrame.from_dict({'a': [3, 2, 1, 's', 't'], 'b': [1, 500.43, 256.13, 5, 'txt']})
df2 = df.astype(object)
def convert(x):
a = []
for i in x.to_list():
a.append(coerce(i))
return pd.Series(a)
#return pd.Series([str(int(i)) if int(i) == i else i for i in x.to_list()])
def coerce(y):
try:
p = float(y)
q = int(y)
if p != q:
return str(p)
else:
return str(q)
except:
return str(y)
df2.apply(convert).to_csv("abc.txt", index=False, sep="\t")
Output in the file will be:
a b
3 1
2 500.43
1 256.13
s 5
t txt
import pandas as pd
df = pd.DataFrame.from_dict({'a': [3, 2, 1, 0], 'b': [1, 500.43, 256.13, 5]})
df2 = df.astype(object)
def convert(x):
return pd.Series([str(int(i)) if int(i) == i else i for i in x.to_list()])
df2.apply(convert).to_csv("abc.txt", index=False, sep="\t")
The output is as follows:
a b
3 1
2 500.43
1 256.13
0 5
Here I am assuming all values in df2 are numeric. If it has a string value, then int(i) will fail.
Upvotes: 0