How can I use Python to use repeating column value to separate rows in a dataframe?

Question

I have an Excel file with a repeating column value that I want to use to group records for insertion into a database. My approach is to use Pandas. Here is a representative dataframe:

import pandas as pd
import numpy as np
df2 = pd.DataFrame({
    'a': ['foo', 'q1', 'q2', 'q3', 'foo', 'q1', 'q2', 'q3'],
    'b': ['bar', 'Zee', np.nan, 500, 'baz', 'Jay', 'Yes', 100]})

I want transpose it to this:

df3 = pd.DataFrame({
    'foo': ['bar', 'baz'],
    'q1': ['Zee', 'Jay'],
    'q2': [numpy.nan, 'Yes'],
    'q3': [500, 100]})

by using the 'foo' value to separate rows or records. How can I do this?

wwnde · Accepted Answer

Establish column group by boolean selecting foo and using .cumsum() method. .groupby group and a and b to list and unstack a.

df2.assign(group=(df2.a=='foo').cumsum()).groupby(['group','a'])['b'].apply(lambda x: pd.DataFrame(x.tolist())).unstack('a').reset_index(drop=True)

a  foo   q1   q2   q3
0  bar  Zee  NaN  500
1  baz  Jay  Yes  100

How can I use Python to use repeating column value to separate rows in a dataframe?

Answers (2)

Related Questions