Reputation: 1381
I need to apply a custom transformation to a dataframe like this:
import pandas as pd
df = pd.DataFrame({
'value': ['a'],
'measure':[['b', 'c']]
})
transformed_df = pd.DataFrame({
'measure': ['b', 'c'],
'value': ['a', 'a']
})
What's an efficient way of getting from df
to transformed_df
?
Upvotes: 2
Views: 175
Reputation: 2720
One approach to the problem would be to think of it as constructing a MultiIndex:
value = ['a']
measure = ['b','c']
idx = pd.MultiIndex.from_product([value,measure], names = ['value','measure'])
df = pd.DataFrame(index=idx).reset_index()
where df is:
value measure
0 a b
1 a c
Having never seen the explode
method before, I was curious to do some timing tests:
def test_multi(value, measure):
idx = pd.MultiIndex.from_product([value,measure], names = ['value','measure'])
df = pd.DataFrame(index=idx).reset_index()
return df
def test_explode(df):
return df.explode('measure').reset_index(drop=True)
value = ['a']*10000
measure = ['b','c']*10000
%timeit test_multi(value, measure)
#13 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
value = ['a']*10000
measure = [['b','c']]*10000
df = pd.DataFrame({
'value': value,
'measure':measure
})
%timeit test_explode(df)
#16.9 ms ± 199 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 1
Reputation: 153460
Try, pd.DataFrame.explode
:
df.explode('measure').reset_index(drop=True)
Output:
value measure
0 a b
1 a c
Upvotes: 3