Reputation: 114240
I am attempting to read in a CSV file with hexadecimal numbers in the index column:
InputBits, V0, V1, V2, V3
7A, 0.000594457716, 0.000620631282, 0.000569834178, 0.000625374384,
7B, 0.000601155649, 0.000624282078, 0.000575955914, 0.000632111367,
7C, 0.000606026872, 0.000629149805, 0.000582689823, 0.000634561234,
7D, 0.000612115902, 0.000634625998, 0.000584526357, 0.000638235952,
7E, 0.000615769413, 0.000637668328, 0.000590648093, 0.00064987256,
7F, 0.000620640637, 0.000643144494, 0.000594933308, 0.000650485013,
I can do it using the following code:
df = pd.read_csv('data.csv', index_col=False,
converters={'InputBits': lambda x: int(x, 16)})
df.set_index('InputBits', inplace=True)
The problem is that this seems unnecessarily clunky. Is there a way to do something equivalent to the following?
df = pd.read_csv('data.csv', converters={'InputBits': lambda x: int(x, 16)})
This fails because InputBits
is now the first data column with
ValueError: invalid literal for int() with base 16: ' 0.000594457716'
Upvotes: 5
Views: 5211
Reputation: 114240
As @root pointed out here, the issue in this example is the misalignment of the header with the column names and the column values, which all have a trailing comma. In fact, the documentation deals with this specific scenario:
If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to not use the first column as the index (row names)
The solution here was first to run
sed -i 's/, \r$//' data.csv
to get rid of the final commas (and Windows line endings). Then, the expected command works almost out of the box:
pd.read_csv('data.csv', index_col='InputBits',
converters={'InputBits': lambda x: int(x, 16)})
Upvotes: 2