assign tuple to cell in pandas

Question

I have a dataframe like this :

date    time    job.filename    job.id  1,3,5-trimethylbenzene  1,3-butadiene   1,4-diaminobutane   1,5-diaminopentane  1,5-pentanedial 1-butanamine    ... nitrosopiperidine   nitrosopyrrolidine  pentanal    propanal    propylbenzene   propylene glycol methyl ether acetate   styrene tetrahydropyrrole   toluene xylenes + ethylbenzene
0   20161214    75506   IMAT list 1-3581-0-20161214-075506.csv  3581    NaN 0.1914  NaN NaN NaN NaN ... 0.5742  NaN NaN NaN NaN NaN NaN 0.3631  NaN NaN
1   20161214    80856   IMAT list 1-3585-0-20161214-080856.csv  3585    NaN 0.2353  NaN NaN NaN NaN ... 12.8760 NaN NaN NaN NaN NaN NaN 1.0447  NaN NaN

I would like to assign the time to every value and form a tuple :

date    time    job.filename    job.id  1,3,5-trimethylbenzene  1,3-butadiene   1,4-diaminobutane   1,5-diaminopentane  1,5-pentanedial 1-butanamine    ... nitrosopiperidine   nitrosopyrrolidine  pentanal    propanal    propylbenzene   propylene glycol methyl ether acetate   styrene tetrahydropyrrole   toluene xylenes + ethylbenzene
0   20161214    75506   IMAT list 1-3581-0-20161214-075506.csv  3581    NaN (0.1914,75506)  NaN NaN NaN NaN ... (0.5742,75506)  NaN NaN NaN NaN NaN NaN (0.3631,75506)  NaN NaN
1   20161214    80856   IMAT list 1-3585-0-20161214-080856.csv  3585    NaN (0.2353,80856)  NaN NaN NaN NaN ... 12.8760 NaN NaN NaN NaN NaN NaN 1.0447  NaN NaN

I tried

headers=new.columns.tolist()
for i, row in new.iterrows():
    val=row[headers[4:]].get_values()
    time=row['time']
    k=[(value,time) for value in val]
    new.set_value(i,headers[4:],k)

but I receive this ValueError: Must have equal len keys and value when setting with an ndarray

Probably due to the fact that the format change. Can I modify my Series format to make this work?

Cheers

miradulo · Accepted Answer

There's no need for explicit iteration I don't think - you can directly zip the time onto the value columns you index with df.apply. As an example,

>>> df
     time vals1  vals2          vals3
0  332903   foo      4  
1   42930   bar      3  

>>> df.iloc[:, 1:] = df.iloc[:,1:].apply(lambda x: list(zip(df.time, x)))

>>> df
     time          vals1        vals2                    vals3
0  332903  (332903, foo)  (332903, 4)  (332903, )
1   42930   (42930, bar)   (42930, 3)   (42930, )

assign tuple to cell in pandas

Answers (1)

Related Questions