Reputation: 131
I've got a dictionary that look like this:
data = {'function_name': ['func1', 'func2', 'func3'],
'argument': [('func1_arg1', 'func1_arg2'),
('func2_arg1',),
('func3_arg1', 'func3_arg2', 'func3_arg3')],
'A': ['value_a1', 'value_a2', 'value_a3'],
'B': 'b',
'types': [('func1_type1', 'func1_type2'),
('func2_type1',),
('func3_type1', 'func3_type2', 'func3_type3')]}
I'd like to convert it into a pandas DataFrame and make it look like this:
function_name argument types A B
func1 func1_arg1 func1_type1 value_a1 b
func1 func1_arg2 func1_type2 value_a1 b
func2 func2_arg1 func2_type1 value_a2 b
func3 func3_arg1 func3_type1 value_a3 b
func3 func3_arg2 func3_type2 value_a3 b
func3 func3_arg3 func3_type3 value_a3 b
As it follows from here if there would be one column of tuples, I would have to do this:
import pandas as pd
data_frame = pd.DataFrame(data)
new_frame = data_frame.set_index(['function_name','A','B'])['argument'].apply(pd.Series).stack().to_frame('argument').reset_index().drop('level_3',1)
But how do I go about it if I've got a few columns of tupples?
EDIT:
There seems to be a little problem with the approved solution. Namely, if there's a tuppled column consisting entirely of None
s or just empty tuples then in the process of forming the new_frame
they get dropped. Is it possible to make pandas avoid dropping the columns.
The initial data looks like this:
data = {'function_name': ['func1', 'func2', 'func3'],
'argument': [('func1_arg1', 'func1_arg2'),
('func2_arg1',),
('func3_arg1', 'func3_arg2', 'func3_arg3')],
'A': ['value_a1', 'value_a2', 'value_a3'],
'B': 'b',
'types': [('func1_type1', 'func1_type2'),
('func2_type1',),
('func3_type1', 'func3_type2', 'func3_type3')],
'info': [(None, None), (None,), (None, None, None)]}
The 'info' columns could be [(), (), ()], the outcome would still be the same.
Upvotes: 2
Views: 566
Reputation: 30605
Since there are multiple columns to expand I dont think this can be in single line but you can use apply with pd.DataFrame constructor. The default value of dropna for stack method is True so set it to false to keep the None values. i.e
index = ['function_name','A','B']
new_frame = data_frame.set_index(index)
.apply(lambda x:pd.DataFrame(x.values.tolist()).stack(dropna=False),1)
.stack(dropna=False).reset_index().drop('level_3',1)
new_frame.columns = index + [x for x in data_frame.columns if x not in index]
function_name A B argument types 0 func1 value_a1 b func1_arg1 func1_type1 1 func1 value_a1 b func1_arg2 func1_type2 2 func2 value_a2 b func2_arg1 func2_type1 3 func3 value_a3 b func3_arg1 func3_type1 4 func3 value_a3 b func3_arg2 func3_type2 5 func3 value_a3 b func3_arg3 func3_type3
With three columns to expand
data = {'function_name': ['func1', 'func2', 'func3'],
'argument': [('func1_arg1', 'func1_arg2'),
('func2_arg1',),
('func3_arg1', 'func3_arg2', 'func3_arg3')],
'A': ['value_a1', 'value_a2', 'value_a3'],
'B': 'b',
'types': [('func1_type1', 'func1_type2'),
('func2_type1',),
('func3_type1', 'func3_type2', 'func3_type3')],
'info': [(None, None), (None,), (None, None, None)]}
function_name A B argument info types 0 func1 value_a1 b func1_arg1 None func1_type1 1 func1 value_a1 b func1_arg2 None func1_type2 2 func2 value_a2 b func2_arg1 None func2_type1 3 func3 value_a3 b func3_arg1 None func3_type1 4 func3 value_a3 b func3_arg2 None func3_type2 5 func3 value_a3 b func3_arg3 None func3_type3
Hope it helps.
Upvotes: 3
Reputation: 107577
Consider a nested list and dict comprehensions if all items are equal length (i.e., 3) using the DataFrame
constructor. Only challenge is the scalar item 'B':'b'
which can be assigned at end if known in advance:
dfs = [pd.DataFrame([{k:v[i] for k,v in data.items() if len(data[k])>1}][0]) \
for i in range(len(data['function_name']))]
df = pd.concat(dfs).reset_index(drop=True).assign(B='b')
print(df)
# A argument function_name types B
# 0 value_a1 func1_arg1 func1 func1_type1 b
# 1 value_a1 func1_arg2 func1 func1_type2 b
# 2 value_a2 func2_arg1 func2 func2_type1 b
# 3 value_a3 func3_arg1 func3 func3_type1 b
# 4 value_a3 func3_arg2 func3 func3_type2 b
# 5 value_a3 func3_arg3 func3 func3_type3 b
Upvotes: 2