Pandas reading scientific data

Question

I have a csv-file with many columns containing something like

"4.2515014131285567e-001"

Pandas reads it as an object, therefore calculation doesn't make sense.

For example *2 gives me:

"4.2515014131285567e-0014.2515014131285567e-001"

How can I use it as a number and doing some math stuff?

I tried to set "dtype=str" "dtype=float" and such things but nothing worked.

krewsayder · Accepted Answer

With some pre-processing, you can convert the data on import and remove non-float records prior to importing if they exist.

Initial dataset in test.txt:

Math
4.2515014131285567e-001
asdas
123123
asdasd124
123
125423414asd

This tests if float, and if so it'll return true/false while creating a list of values to skip.

def isFloat(val):

    try:
        float(val)
        return True

    except:
        return False

with open('test.txt','r') as f:

    skiplines=[]

    for i, v in enumerate(f.readlines()):

        if not isFloat(v.split(',')[0]):

            skiplines.append(i)

# we want to maintain the column header.
    del skiplines[0]



converter = {'NumberColName':lambda x: float(x)}

df = pd.read_csv('test.txt', converters = converter, skiprows= skiplines)

The lambda function can also just be declaring a data type. I like demonstrating the converters because you can easily round or apply logic here if you need it.

The final dataframe looks as expected (note that there are 0's because I have not set my format.

print(df)
           Math
0       0.42515
1  123123.00000
2     123.00000

Pandas reading scientific data

Answers (2)

Related Questions