Reputation: 1065

Melting pandas data frame with multiple variable names and multiple value names

How can I melt a pandas data frame using multiple variable names and values? I have the following data frame that changes its shape in a for loop. In one of the for loop iterations, it looks like this:

ID  Cat    Class_A   Class_B     Prob_A     Prob_B
1   Veg      1        2          0.9         0.1
2   Veg      1        2          0.8         0.2
3   Meat     1        2          0.6         0.4
4   Meat     1        2          0.3         0.7
5   Veg      1        2          0.2         0.8

I need to melt it in such a way that it looks like this:

ID  Cat    Class     Prob    
1   Veg      1       0.9       
1   Veg      2       0.1
2   Veg      1       0.8        
2   Veg      2       0.2
3   Meat     1       0.6         
3   Meat     2       0.4
4   Meat     1       0.3         
4   Meat     2       0.7
5   Veg      1       0.2         
5   Veg      2       0.8

During the for loop the data frame will contain different number of classes with their probabilities. That is why I am looking for a general approach that is applicable in all my for loop iterations. I saw this question and this but they were not helpful!

Upvotes: 11

Answers (5)

sammywemmy

Reputation: 28644

One option is pivot_longer from pyjanitor, which abstracts the process, and is efficient:

# pip install janitor
import janitor

df.pivot_longer(
    index = ['ID', 'Cat'], 
    names_to = '.value', 
    names_pattern = '([a-zA-Z]+)_*')

   ID   Cat  Class  Prob
0   1   Veg      1   0.9
1   2   Veg      1   0.8
2   3  Meat      1   0.6
3   4  Meat      1   0.3
4   5   Veg      1   0.2
5   1   Veg      2   0.1
6   2   Veg      2   0.2
7   3  Meat      2   0.4
8   4  Meat      2   0.7
9   5   Veg      2   0.8

The idea for this particular reshape is that whatever group in the regular expression is paired with the .value stays as the column header.

Upvotes: 0

Ian Thompson

Reputation: 3285

You could also use pd.melt.

# Make DataFrame
df = pd.DataFrame({'ID' :  [i for i in range(1,6)],
                   'Cat' : ['Veg']*2 + ['Meat']*2 + ['Veg'],
                   'Class_A' : [1]*5,
                   'Class_B' : [2]*5,
                   'Prob_A' : [0.9, 0.8, 0.6, 0.3, 0.2],
                   'Prob_B' : [0.1, 0.2, 0.4, 0.7, 0.8]})

# Make class dataframe and prob dataframe
df_class = df.loc[:, ['ID', 'Cat', 'Class_A', 'Class_B']]
df_prob = df.loc[:, ['ID', 'Cat', 'Prob_A', 'Prob_B']]

# Melt class dataframe and prob dataframe
df_class = df_class.melt(id_vars = ['ID',
                                    'Cat'],
                         value_vars = ['Class_A',
                                       'Class_B'],
                         value_name = 'Class')
df_prob = df_prob.melt(id_vars = ['ID',
                                  'Cat'],
                       value_vars = ['Prob_A',
                                     'Prob_B'],
                       value_name = 'Prob')

# Clean variable column so only 'A','B' is left in both dataframes
df_class.loc[:, 'variable'] = df_class.loc[:, 'variable'].str.partition('_')[2]
df_prob.loc[:, 'variable'] = df_prob.loc[:, 'variable'].str.partition('_')[2]

# Merge class dataframe with prob dataframe on 'ID', 'Cat', and 'variable';
# drop 'variable'; sort values by 'ID', 'Cat'
final = df_class.merge(df_prob,
                       how = 'inner',
                       on = ['ID',
                             'Cat',
                             'variable']).drop('variable', axis = 1).sort_values(by = ['ID',
                                                                                       'Cat'])

Upvotes: 0

jezrael

Reputation: 862591

You need lreshape by dict for specify categories:

d = {'Class':['Class_A', 'Class_B'], 'Prob':['Prob_A','Prob_B']}
df = pd.lreshape(df,d)
print (df)
    Cat  ID  Class  Prob
0   Veg   1      1   0.9
1   Veg   2      1   0.8
2  Meat   3      1   0.6
3  Meat   4      1   0.3
4   Veg   5      1   0.2
5   Veg   1      2   0.1
6   Veg   2      2   0.2
7  Meat   3      2   0.4
8  Meat   4      2   0.7
9   Veg   5      2   0.8

More dynamic solution:

Class = [col for col in df.columns if col.startswith('Class')]
Prob = [col for col in df.columns if col.startswith('Prob')]
df = pd.lreshape(df, {'Class':Class, 'Prob':Prob})
print (df)
    Cat  ID  Class  Prob
0   Veg   1      1   0.9
1   Veg   2      1   0.8
2  Meat   3      1   0.6
3  Meat   4      1   0.3
4   Veg   5      1   0.2
5   Veg   1      2   0.1
6   Veg   2      2   0.2
7  Meat   3      2   0.4
8  Meat   4      2   0.7
9   Veg   5      2   0.8

EDIT:

lreshape is now undocumented, but is possible in future will by removed (with pd.wide_to_long too).

Possible solution is merging all 3 functions to one - maybe melt, but now it is not implementated. Maybe in some new version of pandas. Then my answer will be updated.

Upvotes: 13

Ted Petrou

Reputation: 61947

The top voted answer uses the undocumented lreshape which may at some point get deprecated because of its similarity to pd.wide_to_long which is documented and can use directly here. By default suffix matches only to numbers. You must change this to match characters (here I just used any character).

pd.wide_to_long(df, stubnames=['Class', 'Prob'], i=['ID', 'Cat'], j='DROPME', suffix='.')\
  .reset_index()\
  .drop('DROPME', axis=1)

   ID   Cat  Class  Prob
0   1   Veg      1   0.9
1   1   Veg      2   0.1
2   2   Veg      1   0.8
3   2   Veg      2   0.2
4   3  Meat      1   0.6
5   3  Meat      2   0.4
6   4  Meat      1   0.3
7   4  Meat      2   0.7
8   5   Veg      1   0.2
9   5   Veg      2   0.8

Upvotes: 1

BENY

Reputation: 323226

Or you can try this by using str.contain and pd.concat

DF1=df2.loc[:,df2.columns.str.contains('_A|Cat|ID')]
name=['ID','Cat','Class','Prob']
DF1.columns=name
DF2=df2.loc[:,df2.columns.str.contains('_B|Cat|ID')]
DF2.columns=name
pd.concat([DF1,DF2],axis=0)

Out[354]: 
   ID   Cat  Class  Prob
0   1   Veg      1   0.9
1   2   Veg      1   0.8
2   3  Meat      1   0.6
3   4  Meat      1   0.3
4   5   Veg      1   0.2
0   1   Veg      2   0.1
1   2   Veg      2   0.2
2   3  Meat      2   0.4
3   4  Meat      2   0.7
4   5   Veg      2   0.8

Upvotes: 4

Melting pandas data frame with multiple variable names and multiple value names

Answers (5)

Related Questions