Abdall
Abdall

Reputation: 455

Pandas - read_html with index_col not intended output when doing to_html

I may just not understand pandas fully but I am getting some unexpected behavior when using read_html() with the index_col flag set, modifying the data frame, and then attempting to use to_html() again.

Here is what I mean. I have this HTML file:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>index</th>
      <th>Avg</th>
      <th>Min</th>
      <th>Max</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>build1</td>
      <td>55.102323</td>
      <td>37.101219</td>
      <td>60.7</td>
    </tr>
  </tbody>
</table>

I then use pandas read_html as follows:

dataFrameList = pd.read_html('empty.html', index_col=0)
df = dataFrameList[0]

This produces a data frame as follows:

              Avg        Min   Max
index                             
build1  55.102323  37.101219  60.7

I then have a small bit of test code that looks like this:

df.drop(['build1'], inplace=True)
df.loc['build2'] = [121212, 12443, 1290120]
print(df.to_html())

I get the following output:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Avg</th>
      <th>Min</th>
      <th>Max</th>
    </tr>
    <tr>
      <th>index</th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>build2</th>
      <td>121212.0</td>
      <td>12443.0</td>
      <td>1290120.0</td>
    </tr>
  </tbody>
</table>

What did I do wrong? I have tried to set the flag to_html(.., index=False) off but this gets rid of the build names (which I need).

My desired output (just so that it is clear) is as follows:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>index</th>
      <th>Avg</th>
      <th>Min</th>
      <th>Max</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>build2</th>
      <td>121212.0</td>
      <td>12443.0</td>
      <td>1290120.0</td>
    </tr>
  </tbody>
</table>

Upvotes: 0

Views: 1179

Answers (1)

noisefield
noisefield

Reputation: 361

There is a workaround:

df.insert(0, 'index', df.index)
print(df.to_html(index=False))

This produces the desired output (except for that <th> in the second row, which, I guess, is a typo?).

Upvotes: 1

Related Questions