Reputation: 440

Data frame with duplicates in a column map to another column

import pandas as pd
df = pd.DataFrame({
    
    "col1": [11, 12, 13],
    "col2": [21, 22, 23],
    "col3": [31, 32, 33],
    "col4": [41, 42, 43],
})

I have a Pandas data frame like df above, and I would like to reshape df to look like the following.

import pandas as pd
df = pd.DataFrame({

    "col1": [11, 12, 13, 11, 12, 13, 11, 12, 13],
    "col2": [21, 22, 23, 31, 32, 33, 41, 42, 43],
    "indx": ["col2", "col2", "col2", "col3", "col3", "col3", "col4", "col4", "col4"]
})

I can slice up df and get my desired data frame, but what would be the slick, Pythonic way to do it in Pandas?

EDIT

I'm realizing that my question is more complicated that I originall realized, but not too much more (I think). Again, I have a data frame.

import pandas as pd
df = pd.DataFrame({
    
    "col1": [11, 12, 13],
    "col2": [21, 22, 23],
    "col3": [31, 32, 33],
    "col4": [41, 42, 43],
    "col5": [51, 52, 53],
    "col6": [61, 62, 63]
})

I want to do something like melt to get my data frame to be like this:

import pandas as pd
df = pd.DataFrame({

    "col1": [11, 12, 13, 11, 12, 13, 11, 12, 13],
    "colA": [21, 22, 23, 31, 32, 33, 41, 42, 43],
    "indx": ["do", "do", "do", "re", "re", "re", "me", "me", "me"],
    "col4": [41, 42, 43, 41, 42, 43, 41, 42, 43],
    "col5": [51, 52, 53, 51, 52, 53, 51, 52, 53],
    "col6": [61, 62, 63, 61, 62, 63, 61, 62, 63]
})

So I want to be able to set the strings to which the "indx" etc are set; I want to drag along several other columns the way that I drag along "col1", and I want to set the name of the new "col2" column header.

Thanks!

Upvotes: 1

Answers (3)

sammywemmy

Reputation: 28709

Create a duplicate of col4, melt the data and replace the values :

(
    df.assign(temp=df.col4)
    .melt(id_vars=["col1", "col4", "col5", "col6"],
          var_name="indx",
          value_name="colA")
    .replace({"col2": "do", "col3": "re", "temp": "me"})
    .iloc[:, [0, -1, -2, 1, 2, 3]]
)

   col1 colA    indx    col4    col5    col6
0   11  21      do      41      51       61
1   12  22      do      42      52       62
2   13  23      do      43      53       63
3   11  31      re      41      51       61
4   12  32      re      42      52       62
5   13  33      re      43      53       63
6   11  41      me      41      51       61
7   12  42      me      42      52       62
8   13  43      me      43      53       63

Upvotes: 1

Andy L.

Reputation: 25259

You need melt, merge and replace

d = {'col2': 'do', 'col3': 're', 'col4': 'me'}

df_final = (df.melt(['col1','col5','col6'], var_name="indx", value_name="colA")
              .merge(df[['col1','col4']], how='left').replace(d))
    
Out[522]:
   col1  col5  col6 indx  colA  col4
0    11    51    61   do    21    41
1    12    52    62   do    22    42
2    13    53    63   do    23    43
3    11    51    61   re    31    41
4    12    52    62   re    32    42
5    13    53    63   re    33    43
6    11    51    61   me    41    41
7    12    52    62   me    42    42
8    13    53    63   me    43    43

Or you may rename columns before melt

d = {'col2': 'do', 'col3': 're', 'col4': 'me'}

df_final = (df.rename(d, axis=1)
              .melt(['col1','col5','col6'], var_name="indx", value_name="colA")
              .merge(df[['col1','col4']], how='left'))

Out[529]:
   col1  col5  col6 indx  colA  col4
0    11    51    61   do    21    41
1    12    52    62   do    22    42
2    13    53    63   do    23    43
3    11    51    61   re    31    41
4    12    52    62   re    32    42
5    13    53    63   re    33    43
6    11    51    61   me    41    41
7    12    52    62   me    42    42
8    13    53    63   me    43    43

Upvotes: 2

Juan C

Reputation: 6132

This is basically what you're looking for:

df.set_index('col1').unstack()

Output:

      col1
col2  11      21
      12      22
      13      23
col3  11      31
      12      32
      13      33
col4  11      41
      12      42
      13      43

Your column names will be indexes which seems to be what you were looking for

Upvotes: 2

Data frame with duplicates in a column map to another column

Answers (3)

Output:

Related Questions