Jean Blin
Jean Blin

Reputation: 21

Pandas - Spreading different values in a column on many columns

I have the following table:

Option 1 Option 2 Option 3
A X 1
B X 1
A Y 1
C X 1
B Y 1

I need to split the values of each option on different columns based on the translation as follows: Option 1 values A or B becomes opt1000 with values opt1000-A and opt1000-B Option 1 value C becomes opt1001 with value opt1001-C Etc. as shown below:

Opt1000 Opt1001 Opt2000 Opt2001 Opt3000
opt1000-A NO-opt1001 opt2000-X NO-opt2001 1
opt1000-B NO-opt1001 opt2000-X NO-opt2001 1
opt1000-A NO-opt1001 NO-opt2000 opt-2001-y 1
NO-opt1000 opt1001-C opt2000-X NO-opt2001 1
opt-1000-B NO-opt1001 NO-opt2000 opt2001-Y 1

I have a translation file as follows:

Ancient Option name Ancient Value New option name New value
Option 1 A Opt1000 opt1000-A
Option 1 B Opt1000 opt1000-B
Option 1 C Opt1001 opt1001-C
Option 2 X opt2000 opt2000-X
Option 2 Y opt2001 opt2001-Y

Does anyone have an idea of how I could automate this to be performant? (My table has 20 columns and a 1000 lines).


I was thinking of first extracting unique option values to create a new dataframe with the new column names. (By checking the new option name in the translation table and adding the name to a set). Then filling the data frame with 0 (for example, or NaN), then placing the new values one by one in the new dataframe by scanning each value in the original dataframe. When all values are translated, the remaining 0 or NaN will bere placed by NO_new-option-name_.

Let me know if it is clear or need more details.

Upvotes: 2

Views: 49

Answers (1)

piRSquared
piRSquared

Reputation: 294258

The gist of this answer is that I'm going to go column by column and create dummy variables. I'll get a little fancy in how I fill out the text with Numpy array slicing.

from functools import reduce
import pandas as pd
import numpy as np

translator = {'A': '000', 'B': '000', 'C': '001',
              'X': '000', 'Y': '001', 1: '000', '1': '000'}
optifier = lambda name: name.replace('Option ', 'opt')

df_ = df.rename(columns=optifier)

def f(s):
    col = s.name
    translated = s.map(translator)
    mask = pd.get_dummies(translated).eq(1)
    a = np.empty(mask.shape, object)
    i0, j0 = np.where(mask)   # Indices where to drop option text
    i1, j1 = np.where(~mask)  # Indices where to drop absence of option
    a[i0, j0] = col+translated+'-'+s.astype(str)
    cols = col + mask.columns
    a[i1, j1] = 'NO-'+cols[j1]
    return pd.DataFrame(a, mask.index, cols.str.title())

reduce(pd.DataFrame.join, map(f, map(df_.get, df_)))

      Opt1000     Opt1001     Opt2000     Opt2001    Opt3000
0   opt1000-A  NO-opt1001   opt2000-X  NO-opt2001  opt3000-1
1   opt1000-B  NO-opt1001   opt2000-X  NO-opt2001  opt3000-1
2   opt1000-A  NO-opt1001  NO-opt2000   opt2001-Y  opt3000-1
3  NO-opt1000   opt1001-C   opt2000-X  NO-opt2001  opt3000-1
4   opt1000-B  NO-opt1001  NO-opt2000   opt2001-Y  opt3000-1

Upvotes: 1

Related Questions