Bazzert
Bazzert

Reputation: 97

melt pandas df where the variable name is that cell's (row name + column name)

I am trying to reshape a dataframe into 2 columns for submission to the wikipedia web traffic kaggle competition. Right now I have a 145063x60 df with index names as the agent and the column names as the prediction datetime. The target shape is a melted df with each row being an "agent"+"datetime" and the predicted value.

To make this a general problem statement. I am trying to melt a dataframe with each cell having a unique variable name equal to that cell's index name and column name.

    >>> import numpy as np
    >>> import pandas as pd
    >>> df = pd.DataFrame(np.random.randint(0,100,size=(4, 3)), columns=list('ABC'))
    >>> df

        A   B   C
    0  61   9  54
    1  37  58  76
    2  46   9  32
    3  13  57  54

Convert to...

    >>> A  = [str(x)+'A' for x in range(4)]
    >>> B  = [str(x)+'B' for x in range(4)]
    >>> C  = [str(x)+'C' for x in range(4)]
    >>> df = df.melt()
    >>> df.variable = (A+B+C)
    >>> df

       variable  value
    0        0A     61
    1        1A     37
    2        2A     46
    3        3A     13
    4        0B      9
    5        1B     58
    6        2B      9
    7        3B     57
    8        0C     54
    9        1C     76
    10       2C     32
    11       3C     54

Upvotes: 2

Views: 1058

Answers (3)

BENY
BENY

Reputation: 323226

One way

s=df.stack().sort_index(level=1)
s.index=s.index.map('{0[0]}{0[1]}'.format)
s.reset_index()
Out[831]: 
   index   0
0     0A  97
1     1A  20
2     2A  80
3     3A  50
4     0B   2
5     1B  49
6     2B  37
7     3B  31
8     0C  29
9     1C  23
10    2C  28
11    3C  23

Upvotes: 3

Vaishali
Vaishali

Reputation: 38415

Slightly different way, first reset_index and melt.

new_df = df.reset_index().melt(id_vars = 'index')
new_df['variable'] = new_df.iloc[:, :2].astype(str).sum(1)
new_df.drop('index', 1)

variable    value
0   0A  61
1   1A  37
2   2A  46
3   3A  13
4   0B  9
5   1B  58
6   2B  9
7   3B  57
8   0C  54
9   1C  76
10  2C  32
11  3C  54

Upvotes: 2

user3483203
user3483203

Reputation: 51175

Using melt + cumcount

u = df.melt()
v = u.groupby('variable').cumcount().astype(str)
u.assign(variable=v + u.variable)

   variable  value
0        0A     61
1        1A     37
2        2A     46
3        3A     13
4        0B      9
5        1B     58
6        2B      9
7        3B     57
8        0C     54
9        1C     76
10       2C     32
11       3C     54

If you want a more useful way to view this data, maybe consider keeping it as its own column, for easier lookup.

df.assign(counter=np.arange(df.shape[0])).melt('counter')

    counter variable  value
0         0        A     61
1         1        A     37
2         2        A     46
3         3        A     13
4         0        B      9
5         1        B     58
6         2        B      9
7         3        B     57
8         0        C     54
9         1        C     76
10        2        C     32
11        3        C     54

Upvotes: 3

Related Questions