DRay
DRay

Reputation: 27

How can I use Python to use repeating column value to separate rows in a dataframe?

I have an Excel file with a repeating column value that I want to use to group records for insertion into a database. My approach is to use Pandas. Here is a representative dataframe:

import pandas as pd
import numpy as np
df2 = pd.DataFrame({
    'a': ['foo', 'q1', 'q2', 'q3', 'foo', 'q1', 'q2', 'q3'],
    'b': ['bar', 'Zee', np.nan, 500, 'baz', 'Jay', 'Yes', 100]})

I want transpose it to this:

df3 = pd.DataFrame({
    'foo': ['bar', 'baz'],
    'q1': ['Zee', 'Jay'],
    'q2': [numpy.nan, 'Yes'],
    'q3': [500, 100]})

by using the 'foo' value to separate rows or records. How can I do this?

Upvotes: 0

Views: 42

Answers (2)

wwnde
wwnde

Reputation: 26676

Establish column group by boolean selecting foo and using .cumsum() method. .groupby group and a and b to list and unstack a.

df2.assign(group=(df2.a=='foo').cumsum()).groupby(['group','a'])['b'].apply(lambda x: pd.DataFrame(x.tolist())).unstack('a').reset_index(drop=True)

a  foo   q1   q2   q3
0  bar  Zee  NaN  500
1  baz  Jay  Yes  100

Upvotes: 0

BENY
BENY

Reputation: 323326

We can do cumsum create the index then do pivot

df=df2.assign(index=df2.a.eq('foo').cumsum()).pivot('index','a','b')
df
a      foo   q1   q2   q3
index                    
1      bar  Zee  NaN  500
2      baz  Jay  Yes  100

Or we do groupby + cumcount

df=df2.assign(index=df2.groupby('a').cumcount()).pivot('index','a','b')
df
a      foo   q1   q2   q3
index                    
0      bar  Zee  NaN  500
1      baz  Jay  Yes  100

Upvotes: 2

Related Questions