Zaesar
Zaesar

Reputation: 602

Pandas: Create header grouping columns in groups of two columns

I'm learning about Multiindex, groupBy, turples, reading similar questions on Stak Overflow, on Google search and Youtube tutorials. Came into a complex point.

How can I group a number of unknown columns in groups of two. This is what I have. Header and one row:

patterns    responses   patterns    responses   patterns   ...  
hello       hi          Where?      here        When?      ...

And I'm looking to create a header over the header that groups in two columns like:

a                       a                       a
patterns    responses   patterns    responses   patterns   ...  
hello       hi          Where?      here        When?      ...

Appreciate your time!

Upvotes: 0

Views: 1171

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30991

Assume that your DataFrame contains initially an "ordinary" (single level) index on columns:

  patterns responses patterns.1 responses.1 patterns.2 responses.2
0    hello        hi     Where?        here      When?       there

Note that Pandas, e.g. on reading a DataFrame from CSV file, by default appends numbers to columns with repeating names, but for us only the first 2 (without numeric suffix) will be needed.

Note also that column titles at the added (top) level should not be the same. To be able to tell apart consecutive pairs of columns, I took the approach to name them Q1, Q2 and so on.

To have a MultiIndex on columns, you can proceed as follows:

cols = df.columns
nPairs = len(cols) // 2
h1 = [ f'Q{i}' for i in range(1, nPairs + 1) ]
df.columns = pd.MultiIndex.from_product([h1, cols[:2]])

The result is:

        Q1                 Q2                 Q3          
  patterns responses patterns responses patterns responses
0    hello        hi   Where?      here    When?     there

Other possibilities to create a MultiIndex are e.g. from_arrays and from_tuples. Read about them and practice while learning Pandas.

Edit following the question in a comment

One of possible causes of your exception is that your input file contains:

  • An index column, posiibly without name.
  • Then a number of column pairs (patterns / responses).

Then, if you read it using read_csv, you should order to "convert" this column into an index:

df = pd.read_csv('Input.csv', index_col=[0])

Then the number of "actual" columns will be less by one, so my code should run with no exception.

Upvotes: 1

Related Questions