jenkelblankel
jenkelblankel

Reputation: 155

Insert duplicate value into Pandas row

I want to split up the Jobs, Steve. 01/31 column so that [SPGC-9456, 6.0]] is on its own row.

What my code outputs now:

                                             2017-01-31           2017-02-01
   Gates, Bill.                             [[SPGC-14075, 0.5]]         NaN
   Jobs, Steve.           [[SPGC-14075, 3.5], [SPGC-9456, 6.0]]         NaN
   White, John ANDERSON.  [[SPGC-14075, 1.75]]              [[SPGC-9456, 1.75]]

What I want:

                                            2017-01-31           2017-02-01
 Gates, Bill.                             [[SPGC-14075, 0.5]]         NaN
 Jobs, Steve.                           [[SPGC-14075, 3.5]            NaN
 Jobs, Steve.                             [SPGC-9456, 6.0]]           NaN                  
 White, John ANDERSON.                   [[SPGC-14075, 1.75]]  [[SPGC-9456, 1.75]]

Upvotes: 1

Views: 423

Answers (2)

BENY
BENY

Reputation: 323226

I am not using your data, you can try with my temp data.

Temp=pd.DataFrame({'Index':['str1', 'str2', 'str3'],'va':[['x'],[['y'],['z']],['z']],'va2':[np.nan,np.nan,['YY']]}).set_index('Index')
Temp_unnest = pd.DataFrame([[i, x]
              for i, y in Temp['va'].apply(list).iteritems()
                  for x in y], columns=list('IV'))
Temp_unnest['va2']=Temp_unnest.I.map(Temp.va2)
Temp_unnest.set_index('I',inplace=True)
Temp_unnest.columns=Temp.columns

Temp_unnest
Out[121]: 
       va   va2
I              
str1    x   NaN
str2  [y]   NaN
str2  [z]   NaN
str3    z  [YY]

Upvotes: 1

piRSquared
piRSquared

Reputation: 294258

col = '2017-01-31'
v = df[col].values.tolist()
l = [len(x) for x in v]
d = {col: [[x] for y in v for x in y]}
df.reindex(df.index.repeat(l)).assign(**d)

                                 2017-01-31           2017-02-01
Gates, Bill.            [[SPGC-14075, 0.5]]                  NaN
Jobs, Steve.            [[SPGC-14075, 3.5]]                  NaN
Jobs, Steve.             [[SPGC-9456, 6.0]]                  NaN
White, John ANDERSON.  [[SPGC-14075, 1.75]]  [[SPGC-9456, 1.75]]

Setup

df = pd.DataFrame([
        [[['SPGC-14075', .5]], np.nan],
        [[['SPGC-14075', 3.5], ['SPGC-9456', 6.]], np.nan],
        [[['SPGC-14075', 1.75]], [['SPGC-9456', 1.75]]]
    ], 
    'Gates, Bill.|Jobs, Steve.|White, John ANDERSON.'.split('|'),
    ['2017-01-31', '2017-02-01']
)

Upvotes: 2

Related Questions