Chuck
Chuck

Reputation: 3852

Split Column of list of Tuples from Apply to New Columns

I have a dataframe that looks like this:

df = pd.DataFrame(
    {'tod':    [[('a',10),('b',6),('h',3),('p',2)], 
                [('x',11),('e',2),('l',2)], 
                [('r',5),('l',5)], 
                [('n',15)]]})

                                 tod
0  [(a, 10), (b, 6), (h, 3), (p, 2)]
1          [(x, 11), (e, 2), (l, 2)]
2                   [(r, 5), (l, 5)]
3                          [(n, 15)]

I want to expand list of tuples to new columns to get:

                                 tod l1   n1  l2    n2 l3    n3  l4    n4
0  [(a, 10), (b, 6), (h, 3), (p, 2)]  a   10   b     6  h     3   p     2
1          [(x, 11), (e, 2), (l, 2)]  x   11   e     2  l     2   -   nan
2                   [(r, 5), (l, 5)]  r    5   l     5  -   nan   -   nan
3                          [(n, 15)]  n   15   -   nan  -   nan   -   nan

I want to end up with missing values in the corresponding columns if the tuples do not exist.

I'm having trouble as the length of each list (number of tuples) within each cell is different for each row, so I want to dynamically assign new column values as they appear. Also each cell contains a list of tuple pairs, not a single tuple.

I tried something like in this question, but this only allowed expansion of a single tuple into multiple columns (when you know the columns before hand).

Then i looked at this and this but again, the number of columns is unknown so I ended up with:

pd.DataFrame.from_records([{k: v for v, k in row} for row in df.tod])
Out[171]: 
    2    3    5    6    10   11   15
0    p    h  NaN    b    a  NaN  NaN
1    l  NaN  NaN  NaN  NaN    x  NaN
2  NaN  NaN    l  NaN  NaN  NaN  NaN
3  NaN  NaN  NaN  NaN  NaN  NaN    n

Then I looked at splitting cells which contain tuples here and here which looked at converting the tuples to a series, but again, this did not work, as these examples only deal with single tuples of known length, not lists of tuples

How can I approach this problem?

Note: I realise I've not put up a lot of code for "What have you tried" - my console is a heap of garbage that has yielded continuous errors. In the interests of keeping this clean I've left this off.

Upvotes: 1

Views: 745

Answers (2)

Sunitha
Sunitha

Reputation: 12015

n = max([len(row) for row in df.tod])
f = lambda l: sum(l, ()) + ('-', np.nan)*(n-len(l))
l = [list(f(row)) for row in df.tod]

ndf = pd.DataFrame(l,columns='l1   n1  l2    n2 l3    n3  l4    n4'.split())
#  l1  n1 l2   n2 l3   n3 l4   n4
#0  a  10  b  6.0  h  3.0  p  2.0
#1  x  11  e  2.0  l  2.0  -  NaN
#2  r   5  l  5.0  -  NaN  -  NaN
#3  n  15  -  NaN  -  NaN  -  NaN

df.join(ndf)
#
#                                 tod l1  n1 l2   n2 l3   n3 l4   n4
#0  [(a, 10), (b, 6), (h, 3), (p, 2)]  a  10  b  6.0  h  3.0  p  2.0
#1          [(x, 11), (e, 2), (l, 2)]  x  11  e  2.0  l  2.0  -  NaN
#2                   [(r, 5), (l, 5)]  r   5  l  5.0  -  NaN  -  NaN
#3                          [(n, 15)]  n  15  -  NaN  -  NaN  -  NaN

Upvotes: 0

jezrael
jezrael

Reputation: 862591

You can flatten tuples and then create columns names by generator, last join to original DataFrame:

#https://stackoverflow.com/a/45122198/2901002
def mygen(lst):
    for item in lst:
        yield 'l{}'.format(item)
        yield 'n{}'.format(item)

df1 = pd.DataFrame([[b for a in row for b in a] for row in df.tod])
df1.columns = list(mygen(range(1, len(df1.columns) // 2 + 1)))
print(df1)
  l1  n1    l2   n2    l3   n3    l4   n4
0  a  10     b  6.0     h  3.0     p  2.0
1  x  11     e  2.0     l  2.0  None  NaN
2  r   5     l  5.0  None  NaN  None  NaN
3  n  15  None  NaN  None  NaN  None  NaN

df = df.join(df1)
print (df)
                                 tod l1  n1    l2   n2    l3   n3    l4   n4
0  [(a, 10), (b, 6), (h, 3), (p, 2)]  a  10     b  6.0     h  3.0     p  2.0
1          [(x, 11), (e, 2), (l, 2)]  x  11     e  2.0     l  2.0  None  NaN
2                   [(r, 5), (l, 5)]  r   5     l  5.0  None  NaN  None  NaN
3                          [(n, 15)]  n  15  None  NaN  None  NaN  None  NaN

Upvotes: 2

Related Questions