Reputation: 73
I was wondering if it was possible to create a dataframe from a list of lists, where each item in the index_list is attached as an index to each value in lst:
index_list = ['phase1', 'phase2', 'phase3']
lst = [['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['h', 'i', 'j']]
Thank you for any help!!
Edit: the inner lists are not necessarily the same size.
Upvotes: 0
Views: 44
Reputation: 20669
You can use pd.Series.explode
here.
pd.Series(lst,index=index_list).explode()
phase1 a
phase1 b
phase1 c
phase2 d
phase2 e
phase2 f
phase2 g
phase3 h
phase3 i
phase3 j
dtype: object
Another solution using np.repeat
and np.concatenate
r_len = [len(r) for r in lst]
pd.Series(np.concatenate(lst), index=np.repeat(index_list,r_len))
phase1 a
phase1 b
phase1 c
phase2 d
phase2 e
phase2 f
phase2 g
phase3 h
phase3 i
phase3 j
dtype: object
Timeit results:
In [501]: %%timeit
...: pd.Series(lst,index=index_list).explode()
...:
...:
363 µs ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [503]: %%timeit
...: r_len = [len(r) for r in lst]
...: pd.Series(np.concatenate(lst), index=np.repeat(index_list,r_len))
...:
...:
236 µs ± 17.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 3
Reputation: 33
This problem looks similar to R's expand.grid()
function and is listed in this pandas cookbook (bottom of the page).
This function lets you to create dataframe with all combinations of the given input values.
First define a function:
def expand_grid(data_dict):
rows = itertools.product(*data_dict.values())
return pd.DataFrame.from_records(rows, columns=data_dict.keys())
Then you can use it like so:
df = expand_grid({'index': ['phase1', 'phase2', 'phase3'],
'Col1': [['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['h', 'i', 'j']]})
Upvotes: 1