Reputation: 97
I am trying to reshape a dataframe into 2 columns for submission to the wikipedia web traffic kaggle competition. Right now I have a 145063x60 df with index names as the agent and the column names as the prediction datetime. The target shape is a melted df with each row being an "agent"+"datetime" and the predicted value.
To make this a general problem statement. I am trying to melt a dataframe with each cell having a unique variable name equal to that cell's index name and column name.
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(np.random.randint(0,100,size=(4, 3)), columns=list('ABC'))
>>> df
A B C
0 61 9 54
1 37 58 76
2 46 9 32
3 13 57 54
Convert to...
>>> A = [str(x)+'A' for x in range(4)]
>>> B = [str(x)+'B' for x in range(4)]
>>> C = [str(x)+'C' for x in range(4)]
>>> df = df.melt()
>>> df.variable = (A+B+C)
>>> df
variable value
0 0A 61
1 1A 37
2 2A 46
3 3A 13
4 0B 9
5 1B 58
6 2B 9
7 3B 57
8 0C 54
9 1C 76
10 2C 32
11 3C 54
Upvotes: 2
Views: 1058
Reputation: 323226
One way
s=df.stack().sort_index(level=1)
s.index=s.index.map('{0[0]}{0[1]}'.format)
s.reset_index()
Out[831]:
index 0
0 0A 97
1 1A 20
2 2A 80
3 3A 50
4 0B 2
5 1B 49
6 2B 37
7 3B 31
8 0C 29
9 1C 23
10 2C 28
11 3C 23
Upvotes: 3
Reputation: 38415
Slightly different way, first reset_index and melt.
new_df = df.reset_index().melt(id_vars = 'index')
new_df['variable'] = new_df.iloc[:, :2].astype(str).sum(1)
new_df.drop('index', 1)
variable value
0 0A 61
1 1A 37
2 2A 46
3 3A 13
4 0B 9
5 1B 58
6 2B 9
7 3B 57
8 0C 54
9 1C 76
10 2C 32
11 3C 54
Upvotes: 2
Reputation: 51175
Using melt
+ cumcount
u = df.melt()
v = u.groupby('variable').cumcount().astype(str)
u.assign(variable=v + u.variable)
variable value
0 0A 61
1 1A 37
2 2A 46
3 3A 13
4 0B 9
5 1B 58
6 2B 9
7 3B 57
8 0C 54
9 1C 76
10 2C 32
11 3C 54
If you want a more useful way to view this data, maybe consider keeping it as its own column, for easier lookup.
df.assign(counter=np.arange(df.shape[0])).melt('counter')
counter variable value
0 0 A 61
1 1 A 37
2 2 A 46
3 3 A 13
4 0 B 9
5 1 B 58
6 2 B 9
7 3 B 57
8 0 C 54
9 1 C 76
10 2 C 32
11 3 C 54
Upvotes: 3