Mehdi Zare
Mehdi Zare

Reputation: 1381

Transform a pandas dataframe in Python

I need to apply a custom transformation to a dataframe like this:

import pandas as pd

df = pd.DataFrame({
    'value': ['a'],
    'measure':[['b', 'c']]
})

transformed_df = pd.DataFrame({
    'measure': ['b', 'c'],
    'value': ['a', 'a']
})

What's an efficient way of getting from df to transformed_df?

Upvotes: 2

Views: 175

Answers (2)

dubbbdan
dubbbdan

Reputation: 2720

One approach to the problem would be to think of it as constructing a MultiIndex:

value =  ['a']
measure = ['b','c']
idx = pd.MultiIndex.from_product([value,measure], names = ['value','measure'])

df = pd.DataFrame(index=idx).reset_index()

where df is:

  value measure
0     a       b
1     a       c

Having never seen the explode method before, I was curious to do some timing tests:

def test_multi(value, measure):
    idx = pd.MultiIndex.from_product([value,measure], names = ['value','measure'])

    df = pd.DataFrame(index=idx).reset_index()
    
    return df

def test_explode(df):
    return df.explode('measure').reset_index(drop=True)


value =  ['a']*10000
measure = ['b','c']*10000

%timeit test_multi(value, measure)
#13 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

value =  ['a']*10000
measure = [['b','c']]*10000


df = pd.DataFrame({
    'value': value,
    'measure':measure
})

%timeit test_explode(df)
#16.9 ms ± 199 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

Try, pd.DataFrame.explode:

df.explode('measure').reset_index(drop=True)

Output:

  value measure
0     a       b
1     a       c

Upvotes: 3

Related Questions