Jason Strimpel
Jason Strimpel

Reputation: 15466

Pandas list of tuples to MultiIndex

I have a DataFrame that looks like this:

    id    t_l
0   100   [('a', 1), ('b', 2)]
1   151   [('x', 4), ('y', 3)]

I need to return a DataFrame that looks like this:

    id    f    g
0   100  'a'   1
1        'b'   2
2   151  'x'   4
3        'y'   3

What is the best approach to this?

Upvotes: 1

Views: 681

Answers (5)

Mihir Verma
Mihir Verma

Reputation: 340

new = pd.DataFrame(columns=['id','f','g'])
for ind, row in df.iterrows():
    x, y = row
    for i in range(len(y)):
        (a,b) = y[i]
        if i==0:
            new.loc[str(ind)] = x,a,b
        else:
            new.loc[str(ind)+str(i)] = '',a,b

This will unpack the t_1 elements systematically and output the dataframe in a structure you want.

Output:

     id  f  g
0   100  a  1
01       b  2
1   151  x  4
11       y  3

Caution: It is a simple approach, however can be significantly more costly if used for large amounts of data.

Upvotes: 1

jezrael
jezrael

Reputation: 862581

Use nested list comprehension with zip:

zipped = zip(df['id'], df['t_l'])
df = pd.DataFrame([(i, y1, y2) for i, x in zipped for y1, y2 in x], columns=['id','f','g'])
print (df)
    id  f  g
0  100  a  1
1  100  b  2
2  151  x  4
3  151  y  3

If need MultiIndex Series:

zipped = zip(df['id'], df['t_l'])
s = pd.Series({(i, y1):y2 for i, x in zipped for y1, y2 in x})
print (s)
100  a    1
     b    2
151  x    4
     y    3
dtype: int64

Upvotes: 1

user11779478
user11779478

Reputation:

Use apply() AND pd.multiindex :-

df = pd.DataFrame(data = [ [100, [('a', 1), ('b', 2)] ] , [151,[('x', 4), ('y', 3)]] 
], columns=['id', 't_1'] )


inside, outside, value = [], [], []
c = 0
def get_outside(x):
    global c
    length = len( df['t_1'][c] )
    c+=1
    for var in range(length):
        outside.append(x)

def get_inside_value(y):
    for var in y:
        inside.append(var[0])
        value.append(var[1])


df['id'].apply(get_outside)
df['t_1'].apply(get_inside_value)

hier_index = list(zip(outside,inside))
hier_index = pd.MultiIndex.from_tuples(hier_index)
new_df = pd.DataFrame(value ,index=hier_index,columns=['g',])
new_df.index.names = ['id','f']
new_df

Output

            g
  id    f   
 100    a   1
        b   2
 151    x   4
        y   3

Upvotes: 1

Andy L.
Andy L.

Reputation: 25239

Edit: @ALollz makes a good point about speed of np.concatenate vs. chain.from_iterable(df.t_l). I %timeit and it is true. Therefore, I added solution using from_iterable(df.t_l)

from itertools import chain
pd.DataFrame(chain.from_iterable(df.t_l), index=np.repeat(df.id, df.t_l.str.len()), \
                                          columns=['f', 'g']).reset_index()

Original:

I would construct a new df using np.concatenate for data and np.repeat for index. Finally, reset_index to put id back to column

pd.DataFrame(np.concatenate(df.t_l), index=np.repeat(df.id, df.t_l.str.len()), \
                                              columns=['f', 'g']).reset_index()

Out[596]:
    id  f  g
0  100  a  1
1  100  b  2
2  151  x  4
3  151  y  3

Upvotes: 2

iamklaus
iamklaus

Reputation: 3770

df.set_index('id').apply(lambda x: pd.Series([list(i) for i in list(x.t_l)]), axis=1).stack().apply(pd.Series).rename({0:'f',1:'g'}, axis=1).reset_index(level=1, drop=True)

OR.. removing the pd.Series approach for efficiency

pd.DataFrame(df.set_index('id').apply(lambda x: pd.Series([list(i) for i in list(x.t_l)]), axis=1).stack()).rename({0:'f',1:'g'}, axis=1).reset_index(level=1, drop=True)

Output

     f  g
id       
100  a  1
100  b  2
151  x  4
151  y  3

Upvotes: 1

Related Questions