Associating units with Pandas DataFrame

Question

I'm using a web service that returns a CSV response in which the 1st row contains the column names, and the 2nd row contains the column units, for example:

longitude,latitude
degrees_east,degrees_north
-142.842,-1.82
-25.389,39.87
-37.704,27.114

I can read this into a Pandas DataFrame:

import pandas as pd
from StringIO import StringIO

x = '''
longitude,latitude
degrees_east,degrees_north
-142.842,-1.82
-25.389,39.87
-37.704,27.114
'''

# Create a Pandas DataFrame
obs=pd.read_csv(StringIO(x.strip()), sep=",\s*")

print(obs)

which produces

      longitude       latitude
0  degrees_east  degrees_north
1      -142.842          -1.82
2       -25.389          39.87
3       -37.704         27.114

But what would be the best approach to associate the units with the DataFrame columns for later use, for example labeling plots?

unutbu · Accepted Answer

Allowing pandas to read the second line as data is screwing up the dtype for the columns. Instead of a float dtype, the presence of strings make the dtype of the columns object, and the underlying objects, even the numbers, are strings. This screws up all numerical operations:

In [8]: obs['latitude']+obs['longitude']
Out[8]: 
0    degrees_northdegrees_east
1                -1.82-142.842
2                 39.87-25.389
3                27.114-37.704

In [9]: obs['latitude'][1]
Out[9]: '-1.82'

So it is imperative that pd.read_csv skip the second line.

The following is pretty ugly, but given the format of the input, I don't see a better way.

import pandas as pd
from StringIO import StringIO

x = '''
longitude,latitude
degrees_east,degrees_north
-142.842,-1.82
-25.389,39.87
-37.704,27.114
'''

content = StringIO(x.strip())

def read_csv(content):
    columns = next(content).strip().split(',')
    units = next(content).strip().split(',')
    obs = pd.read_table(content, sep=",\s*", header=None)
    obs.columns = ['{c} ({u})'.format(c=col, u=unit)
                   for col, unit in zip(columns, units)]
    return obs

obs = read_csv(content)
print(obs)
#    longitude (degrees_east)  latitude (degrees_north)
# 0                  -142.842                    -1.820
# 1                   -25.389                    39.870
# 2                   -37.704                    27.114
print(obs.dtypes)
# longitude (degrees_east)    float64
# latitude (degrees_north)    float64

Associating units with Pandas DataFrame

Answers (2)

Related Questions