Reputation: 602
I'm learning about Multiindex, groupBy, turples, reading similar questions on Stak Overflow, on Google search and Youtube tutorials. Came into a complex point.
How can I group a number of unknown columns in groups of two. This is what I have. Header and one row:
patterns responses patterns responses patterns ...
hello hi Where? here When? ...
And I'm looking to create a header over the header that groups in two columns like:
a a a
patterns responses patterns responses patterns ...
hello hi Where? here When? ...
Appreciate your time!
Upvotes: 0
Views: 1171
Reputation: 30991
Assume that your DataFrame contains initially an "ordinary" (single level) index on columns:
patterns responses patterns.1 responses.1 patterns.2 responses.2
0 hello hi Where? here When? there
Note that Pandas, e.g. on reading a DataFrame from CSV file, by default appends numbers to columns with repeating names, but for us only the first 2 (without numeric suffix) will be needed.
Note also that column titles at the added (top) level should not be the same. To be able to tell apart consecutive pairs of columns, I took the approach to name them Q1, Q2 and so on.
To have a MultiIndex on columns, you can proceed as follows:
cols = df.columns
nPairs = len(cols) // 2
h1 = [ f'Q{i}' for i in range(1, nPairs + 1) ]
df.columns = pd.MultiIndex.from_product([h1, cols[:2]])
The result is:
Q1 Q2 Q3
patterns responses patterns responses patterns responses
0 hello hi Where? here When? there
Other possibilities to create a MultiIndex are e.g. from_arrays and from_tuples. Read about them and practice while learning Pandas.
One of possible causes of your exception is that your input file contains:
Then, if you read it using read_csv, you should order to "convert" this column into an index:
df = pd.read_csv('Input.csv', index_col=[0])
Then the number of "actual" columns will be less by one, so my code should run with no exception.
Upvotes: 1