Reputation: 27
I have an Excel file with a repeating column value that I want to use to group records for insertion into a database. My approach is to use Pandas. Here is a representative dataframe:
import pandas as pd
import numpy as np
df2 = pd.DataFrame({
'a': ['foo', 'q1', 'q2', 'q3', 'foo', 'q1', 'q2', 'q3'],
'b': ['bar', 'Zee', np.nan, 500, 'baz', 'Jay', 'Yes', 100]})
I want transpose it to this:
df3 = pd.DataFrame({
'foo': ['bar', 'baz'],
'q1': ['Zee', 'Jay'],
'q2': [numpy.nan, 'Yes'],
'q3': [500, 100]})
by using the 'foo' value to separate rows or records. How can I do this?
Upvotes: 0
Views: 42
Reputation: 26676
Establish column group
by boolean selecting foo
and using .cumsum()
method. .groupby
group
and a
and b
to list and unstack
a
.
df2.assign(group=(df2.a=='foo').cumsum()).groupby(['group','a'])['b'].apply(lambda x: pd.DataFrame(x.tolist())).unstack('a').reset_index(drop=True)
a foo q1 q2 q3
0 bar Zee NaN 500
1 baz Jay Yes 100
Upvotes: 0
Reputation: 323326
We can do cumsum
create the index then do pivot
df=df2.assign(index=df2.a.eq('foo').cumsum()).pivot('index','a','b')
df
a foo q1 q2 q3
index
1 bar Zee NaN 500
2 baz Jay Yes 100
Or we do groupby
+ cumcount
df=df2.assign(index=df2.groupby('a').cumcount()).pivot('index','a','b')
df
a foo q1 q2 q3
index
0 bar Zee NaN 500
1 baz Jay Yes 100
Upvotes: 2