Windtalker
Windtalker

Reputation: 796

Round pandas dataframe numeric values in string type columns

I did searched online posts but what I found were all how to only round float columns in a mixed dataframe, but my problem is how to round float values in a string type column.

Say my dataframe like this:

pd.DataFrame({'a':[1.1111,2.2222, 'aaaa'], 'b':['bbbb', 2.2222,3.3333], 'c':[3.3333,'cccc', 4.4444]})

Looking for an output like

pd.DataFrame({'a':[1.1,2.2, 'aaaa'], 'b':['bbbb', 2.2,3.3], 'c':[3.3,'cccc', 4.4]})

----Above is a straight question------

----Reason why I do so is below----

I have 3 csv files, each has string header and float value, with different row and column number.

I need to append the 3 in one dataframe horizontally then expoert as a new csv, separate by a empty row.

My 3 dataframe like this:


One:

enter image description here

Two:

enter image description here

Three:

enter image description here

to

enter image description here

Pls note that the output dataframe contains headers from the 3 sub dataframe

So, when I import them, first csv of course imported by pd.read_csv, no issue.

Then I used .append(pd.Series([np.NaN])) to create an empty row as separator row

Then second csv loaded then I used pd.append(), but if I don't include 'header=None' in 'read_csv()' then the second one will not be mapped horizontally under first one, coz the csv files have uneven rows and columns.

So two options,

  1. Include 'header=None' in 'read_csv()', then I can't simply use round() as

    df = df.round()

does not work, need to find a way to round only numeric values in each column

Also note that when include 'header=None', All column types are 'object', by df.types

  1. Not include 'header=None' in 'read_csv()', then I could round each dataframe, but having trouble to combine them horizontally with their headers.

Any suggestion?


csv example

import pandas as pd
import io
exp = io.StringIO("""
month;abc;cba;fef;sefe;yjy;gtht
100;0.45384534;0.43455;0.56385;0.5353;0.523453;0.53553
200;0.453453;0.453453;0.645396;0.76786;0.36327;0.453659
""")

df = pd.read_csv(exp, sep=";", header=None)
print(df.dtypes)    
df = df.applymap(lambda x: round(x, 1) 
    if isinstance(x, (int, float)) else x)
print(df) 

Upvotes: 3

Views: 3844

Answers (1)

Eric B
Eric B

Reputation: 1675

There is a simple way to loop over every single element in a dataframe using applymap. Combined with isinstance, which test for a specific type, you can get the following.

df = pd.DataFrame({'a':[1.1111,2.2222, 'aaaa'], 'b':['bbbb', 2.2222,3.3333], 'c':[3.3333,'cccc', 4.4444]})

df.dtypes

a    object
b    object
c    object
dtype: object

df2 = df.applymap(lambda x: round(x, 1) if isinstance(x, (int, float)) else x)

Obtaining the following dataframe:

      a     b     c
0   1.1  bbbb   3.3
1   2.2   2.2  cccc
2  aaaa   3.3   4.4

With the following dtypes unchanged

df2.dtypes

a    object
b    object
c    object
dtype: object

As for your other example in your question, I noticed that even the numbers are saved as strings. I noticed a method converting strings to floats pd.to_numeric for a Series.

From your exp, I get the following:

df = pd.read_csv(exp, sep=";", header=None)
df2 = df.apply(lambda x: pd.to_numeric(x, errors='ignore'), axis=1)
df3 = df2.applymap(lambda x: round(x, 1) if isinstance(x, (int, float)) else x)

Upvotes: 4

Related Questions