Adding column to pandas dataframe taking values from list in other column

Question

I'm new to Python so I'm sorry if terminology is not correct; I've searched for similar posts but didn't find anything helpful for my case. I have a dataframe like this:

    Column1     Column2
0   0001        [('A','B'),('C','D'),('E','F')]
1   0001        [('A','B'),('C','D'),('E','F')]
2   0001        [('A','B'),('C','D'),('E','F')]
3   0002        [('G','H'),('I','J')]
4   0002        [('G','H'),('I','J')]

Each row is replicated n times based on the number of tuples contained in the list of Column2. What I'd like to do is to add a new column containing only one tuple per row:

Column1     Column2                             Column2_new
0   0001        [('A','B'),('C','D'),('E','F')]     'A' 'B'
1   0001        [('A','B'),('C','D'),('E','F')]     'C' 'D'
2   0001        [('A','B'),('C','D'),('E','F')]     'E' 'F'
3   0002        [('G','H'),('I','J')]               'G' 'H'
4   0002        [('G','H'),('I','J')]               'I' 'J'

Can you please help me with this?

Thanks in advance for any suggestion

sammywemmy · Accepted Answer

data = {'Column1':["0001"]*3 + ["0002"]*2,
        'Column2':[[('A','B'),('C','D'),('E','F')]]*3 + [[('G','H'),('I','J')]]*2
       }
df = pd.DataFrame(data)

print(df)


    Column1       Column2
0   0001    [(A, B), (C, D), (E, F)]
1   0001    [(A, B), (C, D), (E, F)]
2   0001    [(A, B), (C, D), (E, F)]
3   0002    [(G, H), (I, J)]
4   0002    [(G, H), (I, J)]

M = df.drop_duplicates('Column1')
print(M)

    Column1     Column2
0   0001    [(A, B), (C, D), (E, F)]
3   0002    [(G, H), (I, J)]

pd.concat([df,M.Column2.explode().reset_index(drop=True).rename('new')],axis=1)    

  Column1       Column2                   new
0   0001    [(A, B), (C, D), (E, F)]    (A, B)
1   0001    [(A, B), (C, D), (E, F)]    (C, D)
2   0001    [(A, B), (C, D), (E, F)]    (E, F)
3   0002    [(G, H), (I, J)]            (G, H)
4   0002    [(G, H), (I, J)]            (I, J)

Alternatively, you could use the itertools functions - product and chain to get ur data, and concat back to the original dataframe:

from itertools import product,chain
res = chain.from_iterable(product([first],last)
                          for first, last 
                          in zip(M.Column1, M.Column2))
out = pd.DataFrame(res,columns=['Column1','new'])
pd.concat((df,out.new),axis=1)

Adding column to pandas dataframe taking values from list in other column

Answers (2)

Related Questions