Gravel
Gravel

Reputation: 465

Why is python pandas dataframe rounding my values?

I do not understand why pandas dataframe is rounding the values in my column where I divide the values of two other columns. I want the numbers in the new colums with two decimals, but the values are rounded. I have checked the dtypes of the columns and both are "float64".

import pandas as pd
import numpy as np


# CURRENT DIRECTORY 
cd = os.path.dirname(os.getcwd())

# concatenate csv files
dfList = []

for root, dirs, files in os.walk(cd):
    for fname in files:
        if re.match("output_contigs_SCMgenes.csv", fname):
            frame = pd.read_csv(os.path.join(root, fname))
            dfList.append(frame)    

df = pd.concat(dfList)

#replace nan in SCM column with 0
df['SCM'].fillna(0, inplace=True)

#add column with genes/SCM
df['genes/SCM'] = df['genes']/df['SCM']

The output is as follows:

    genome  contig  genes  SCM  genes/SCM
0    20900      48      1    0        inf
1    20900      37    130  103          1
2    20900      35      1    1          1
3    20900       1     79   66          1
4    20900      66      5    3          2

But I want that my last column does not contain rounded values, but values with at least 2 decimals.

Upvotes: 8

Views: 13910

Answers (5)

Anu
Anu

Reputation: 3440

I had faced similar issue, if you're reading data from csv then use the option float_precision='round_trip' as

pd.read_csv(resultant_file, sep='\t',float_precision='round_trip')

It will hold of your precision, if you don't use this option it will limit the precision for speed. -see @MarkDickinson comment.

and if it's related to displaying data frame in jupyter notebook, then set the precision as display.precisionfollowing

pd.set_option("precision", 20)

Upvotes: 2

anonymous
anonymous

Reputation: 143

Try using round() function

df['genes/SCM'] = df['genes']/df['SCM'].round(2)

Upvotes: 0

Nafeez Quraishi
Nafeez Quraishi

Reputation: 6168

For rounding off with desired number of digits after decimal e.g. 2 digits after decimal as asked in the question

df.round({'genes/SCM': 2})

for multiple columns

df.round({'col1_name': 1, 'col2_name': 2})

Also, check precision is not set to 0, pd.set_option('precision', 5) can be used to set the precision appropriately. Here 5 is number of desired digits needed after decimal as an example.

Upvotes: 1

zipa
zipa

Reputation: 27869

Can't be sure because I can't reproduce but you can try:

from __future__ import division

at the very top of your script.

Upvotes: 0

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

I could reproduce this behaviour by setting the pd.options.display.precision to 0:

In [4]: df['genes/SCM'] = df['genes']/df['SCM']

In [5]: df
Out[5]:
   genome  contig  genes  SCM  genes/SCM
0   20900      48      1    0        inf
1   20900      37    130  103   1.262136
2   20900      35      1    1   1.000000
3   20900       1     79   66   1.196970
4   20900      66      5    3   1.666667

In [6]: pd.options.display.precision = 0

In [7]: df
Out[7]:
   genome  contig  genes  SCM  genes/SCM
0   20900      48      1    0        inf
1   20900      37    130  103          1
2   20900      35      1    1          1
3   20900       1     79   66          1
4   20900      66      5    3          2

Check your Pandas & Numpy options

Upvotes: 4

Related Questions