BigBear
BigBear

Reputation: 131

Converting tuples into rows from numerous columns in a pandas DataFrame

I've got a dictionary that look like this:

data = {'function_name': ['func1', 'func2', 'func3'],
        'argument': [('func1_arg1', 'func1_arg2'), 
                     ('func2_arg1',), 
                     ('func3_arg1', 'func3_arg2', 'func3_arg3')],
        'A': ['value_a1', 'value_a2', 'value_a3'],
        'B': 'b',
        'types': [('func1_type1', 'func1_type2'), 
                  ('func2_type1',),
                  ('func3_type1', 'func3_type2', 'func3_type3')]}

I'd like to convert it into a pandas DataFrame and make it look like this:

function_name    argument    types         A          B

func1            func1_arg1  func1_type1   value_a1   b
func1            func1_arg2  func1_type2   value_a1   b
func2            func2_arg1  func2_type1   value_a2   b
func3            func3_arg1  func3_type1   value_a3   b
func3            func3_arg2  func3_type2   value_a3   b
func3            func3_arg3  func3_type3   value_a3   b

As it follows from here if there would be one column of tuples, I would have to do this:

import pandas as pd


data_frame = pd.DataFrame(data)
new_frame = data_frame.set_index(['function_name','A','B'])['argument'].apply(pd.Series).stack().to_frame('argument').reset_index().drop('level_3',1)

But how do I go about it if I've got a few columns of tupples?

EDIT:

There seems to be a little problem with the approved solution. Namely, if there's a tuppled column consisting entirely of Nones or just empty tuples then in the process of forming the new_frame they get dropped. Is it possible to make pandas avoid dropping the columns.

The initial data looks like this:

data = {'function_name': ['func1', 'func2', 'func3'],
        'argument': [('func1_arg1', 'func1_arg2'), 
                     ('func2_arg1',), 
                     ('func3_arg1', 'func3_arg2', 'func3_arg3')],
        'A': ['value_a1', 'value_a2', 'value_a3'],
        'B': 'b',
        'types': [('func1_type1', 'func1_type2'), 
                  ('func2_type1',),
                  ('func3_type1', 'func3_type2', 'func3_type3')],
        'info': [(None, None), (None,), (None, None, None)]}

The 'info' columns could be [(), (), ()], the outcome would still be the same.

Upvotes: 2

Views: 566

Answers (2)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

Since there are multiple columns to expand I dont think this can be in single line but you can use apply with pd.DataFrame constructor. The default value of dropna for stack method is True so set it to false to keep the None values. i.e

index = ['function_name','A','B']
new_frame = data_frame.set_index(index)
            .apply(lambda x:pd.DataFrame(x.values.tolist()).stack(dropna=False),1)
            .stack(dropna=False).reset_index().drop('level_3',1)
new_frame.columns = index + [x for x in data_frame.columns if x not in index]
   function_name A        B    argument         types
0  func1  value_a1        b    func1_arg1  func1_type1
1  func1  value_a1        b    func1_arg2  func1_type2
2  func2  value_a2        b    func2_arg1  func2_type1
3  func3  value_a3        b    func3_arg1  func3_type1
4  func3  value_a3        b    func3_arg2  func3_type2
5  func3  value_a3        b    func3_arg3  func3_type3

With three columns to expand

data = {'function_name': ['func1', 'func2', 'func3'],
    'argument': [('func1_arg1', 'func1_arg2'), 
                 ('func2_arg1',), 
                 ('func3_arg1', 'func3_arg2', 'func3_arg3')],
    'A': ['value_a1', 'value_a2', 'value_a3'],
    'B': 'b',
    'types': [('func1_type1', 'func1_type2'), 
              ('func2_type1',),
              ('func3_type1', 'func3_type2', 'func3_type3')],
    'info': [(None, None), (None,), (None, None, None)]}
  function_name         A  B    argument  info        types
0         func1  value_a1  b  func1_arg1  None  func1_type1
1         func1  value_a1  b  func1_arg2  None  func1_type2
2         func2  value_a2  b  func2_arg1  None  func2_type1
3         func3  value_a3  b  func3_arg1  None  func3_type1
4         func3  value_a3  b  func3_arg2  None  func3_type2
5         func3  value_a3  b  func3_arg3  None  func3_type3

Hope it helps.

Upvotes: 3

Parfait
Parfait

Reputation: 107577

Consider a nested list and dict comprehensions if all items are equal length (i.e., 3) using the DataFrame constructor. Only challenge is the scalar item 'B':'b' which can be assigned at end if known in advance:

dfs = [pd.DataFrame([{k:v[i] for k,v in data.items() if len(data[k])>1}][0]) \
             for i in range(len(data['function_name']))]

df = pd.concat(dfs).reset_index(drop=True).assign(B='b') 

print(df)
#           A    argument function_name        types  B
# 0  value_a1  func1_arg1         func1  func1_type1  b
# 1  value_a1  func1_arg2         func1  func1_type2  b
# 2  value_a2  func2_arg1         func2  func2_type1  b
# 3  value_a3  func3_arg1         func3  func3_type1  b
# 4  value_a3  func3_arg2         func3  func3_type2  b
# 5  value_a3  func3_arg3         func3  func3_type3  b

Upvotes: 2

Related Questions