akleefel
akleefel

Reputation: 87

pivoting dataframe with two columns of lists

I have a Dataframe like:

matrix = [(222, ['A','B','C'], [1,2,3]),
         (333, ['A','B','D'], [1,3,5])]

df = pd.DataFrame(matrix, columns=['timestamp', 'variable', 'value'])
timestamp     variable         value   

222           ['A','B','C']    [1,2,3]
333           ['A','B','D']    [1,3,5]

and would like to pivot it so that the timestamp value is kept, the unique values in the variable column become additional columns, and values from value are sorted in the respective columns.

The output should look as follows:

timestamp   A    B    C    D 

222         1    2    3    nan
333         1    3    nan  5 

any help would be greatly appreciated! :)

Upvotes: 4

Views: 173

Answers (3)

jezrael
jezrael

Reputation: 863166

Create dictionary with zip, pass to DataFrame constructor:

a = [dict(zip(*x)) for x in zip(df['variable'], df['value'])]
print (a)
[{'A': 1, 'B': 2, 'C': 3}, {'A': 1, 'B': 3, 'D': 5}]

df = df[['timestamp']].join(pd.DataFrame(a, index=df.index))
print (df)
   timestamp  A  B    C    D
0        222  1  2  3.0  NaN
1        333  1  3  NaN  5.0

If many another columns use DataFrame.pop for extract columns:

a = [dict(zip(*x)) for x in zip(df.pop('variable'), df.pop('value'))]

df = df.join(pd.DataFrame(a, index=df.index))
print (df)
   timestamp  A  B    C    D
0        222  1  2  3.0  NaN
1        333  1  3  NaN  5.0

Upvotes: 3

BENY
BENY

Reputation: 323316

Using unnest first , then just pivot

unnesting(df,['variable','value']).pivot(*df.columns)
Out[79]: 
variable     A    B    C    D
timestamp                    
222        1.0  2.0  3.0  NaN
333        1.0  3.0  NaN  5.0

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

Upvotes: 3

Haleemur Ali
Haleemur Ali

Reputation: 28303

You can pass the values & column names to a the pd.Series constructor. This will automatically expand the values in your desired shape.

df.set_index('timestamp').apply(lambda row: pd.Series(row.value, index=row.variable), axis=1)

# outputs:
             A    B    C    D
timestamp
222        1.0  2.0  3.0  NaN
333        1.0  3.0  NaN  5.0

Upvotes: 2

Related Questions