Parth
Parth

Reputation: 21

Python Replace Two Headers with One

I currently have a dataframe that looks like this:

     | Unnamed:1|Unnamed:2 |Unnamed:3 |
     | Column 1 | Column 2 | Column 3 | 
0    | Data     | Data     | Data     |
1    | Data     | Data     | Data     |
2    | Data     | Data     | Data     |
3    | Data     | Data     | Data     |

I've been using the following code unsuccessfully as it replaces the header with the index [0] row:

new_header = df.iloc[0] 
df = df[1:] 
df.columns = new_header

Ideally the new dataframe would look like:

     | Column 1 | Column 2 | Column 3 | 
0    | Data     | Data     | Data     |
1    | Data     | Data     | Data     |
2    | Data     | Data     | Data     |
3    | Data     | Data     | Data     |

Upvotes: 0

Views: 444

Answers (2)

Cameron Riddell
Cameron Riddell

Reputation: 13407

It seems that your data is being read in as having a pd.MultiIndex column. This means that functionally each of your column labels is actually a tuple of values:

[("Unnamed:1", "Column 1"), ("Unnamed:2", "Column 2"), ...]

To fix this you can drop the 0th level of your columns using df.droplevel(0, axis=1) method. This will drop the first element of each of those aforementioned tuples. We specify axis=1 because we want to drop the 0th level of the column index, and not the row index.

Snippet to try:

df = df.droplevel(0, axis=1)

Reproducible snippet:

columns = pd.MultiIndex.from_arrays(
    [["Unnamed:1", "Unnamed:2", "Unnamed:3"], 
     ["Column 1", "Column 2", "Column 3"]]
)
data = [[1,2,3],[4,5,6],[7,8,9]]
df = pd.DataFrame(data, columns=columns)

print(df)
  Unnamed:1 Unnamed:2 Unnamed:3
   Column 1  Column 2  Column 3
0         1         2         3
1         4         5         6
2         7         8         9

Now using .droplevel(...):

df = df.droplevel(0, axis=1)
print(df)

   Column 1  Column 2  Column 3
0         1         2         3
1         4         5         6
2         7         8         9

Upvotes: 1

brunoff
brunoff

Reputation: 4209

You are dealing with Multiindex columns. If you want to keep just the second header line, this would be enough: df.columns = df.columns.get_level_values(1)

Upvotes: 1

Related Questions