alex8501
alex8501

Reputation: 25

Adding column to pandas dataframe taking values from list in other column

I'm new to Python so I'm sorry if terminology is not correct; I've searched for similar posts but didn't find anything helpful for my case. I have a dataframe like this:

    Column1     Column2
0   0001        [('A','B'),('C','D'),('E','F')]
1   0001        [('A','B'),('C','D'),('E','F')]
2   0001        [('A','B'),('C','D'),('E','F')]
3   0002        [('G','H'),('I','J')]
4   0002        [('G','H'),('I','J')]

Each row is replicated n times based on the number of tuples contained in the list of Column2. What I'd like to do is to add a new column containing only one tuple per row:

Column1     Column2                             Column2_new
0   0001        [('A','B'),('C','D'),('E','F')]     'A' 'B'
1   0001        [('A','B'),('C','D'),('E','F')]     'C' 'D'
2   0001        [('A','B'),('C','D'),('E','F')]     'E' 'F'
3   0002        [('G','H'),('I','J')]               'G' 'H'
4   0002        [('G','H'),('I','J')]               'I' 'J'

Can you please help me with this?

Thanks in advance for any suggestion

Upvotes: 2

Views: 57

Answers (2)

sammywemmy
sammywemmy

Reputation: 28729

data = {'Column1':["0001"]*3 + ["0002"]*2,
        'Column2':[[('A','B'),('C','D'),('E','F')]]*3 + [[('G','H'),('I','J')]]*2
       }
df = pd.DataFrame(data)

print(df)


    Column1       Column2
0   0001    [(A, B), (C, D), (E, F)]
1   0001    [(A, B), (C, D), (E, F)]
2   0001    [(A, B), (C, D), (E, F)]
3   0002    [(G, H), (I, J)]
4   0002    [(G, H), (I, J)]

M = df.drop_duplicates('Column1')
print(M)

    Column1     Column2
0   0001    [(A, B), (C, D), (E, F)]
3   0002    [(G, H), (I, J)]

pd.concat([df,M.Column2.explode().reset_index(drop=True).rename('new')],axis=1)    

  Column1       Column2                   new
0   0001    [(A, B), (C, D), (E, F)]    (A, B)
1   0001    [(A, B), (C, D), (E, F)]    (C, D)
2   0001    [(A, B), (C, D), (E, F)]    (E, F)
3   0002    [(G, H), (I, J)]            (G, H)
4   0002    [(G, H), (I, J)]            (I, J)

Alternatively, you could use the itertools functions - product and chain to get ur data, and concat back to the original dataframe:

from itertools import product,chain
res = chain.from_iterable(product([first],last)
                          for first, last 
                          in zip(M.Column1, M.Column2))
out = pd.DataFrame(res,columns=['Column1','new'])
pd.concat((df,out.new),axis=1)

Upvotes: 1

anky
anky

Reputation: 75150

We can do df.lookup after groupby+cumcount

idx = df.groupby('Column1').cumcount()
df['new']= pd.DataFrame(df['Column2'].tolist()).lookup(df.index,idx)

print(df)
   Column1                   Column2     new
0        1  [(A, B), (C, D), (E, F)]  (A, B)
1        1  [(A, B), (C, D), (E, F)]  (C, D)
2        1  [(A, B), (C, D), (E, F)]  (E, F)
3        2          [(G, H), (I, J)]  (G, H)
4        2          [(G, H), (I, J)]  (I, J)

Upvotes: 2

Related Questions