jgg
jgg

Reputation: 831

Functional Programming: How does one create a new column to a dataframe that contains a multiindex column?

Suppose the below simplified dataframe. (The actual df is much, much bigger.) How does one assign values to a new column f such that f is a function of another column (e.,g. e)? I'm pretty sure one needs to use apply or map but never done this with a dataframe that has multiindex columns?

df = pd.DataFrame([[1,2,3,4], [5,6,7,8], [9,10,11,12], [13,14,15,16]])
df.columns = pd.MultiIndex.from_tuples((("a", "d"), ("a", "e"), ("b", "d"), ("b","e")))
df
    a       b
    d   e   d   e
0   1   2   3   4
1   5   6   7   8
2   9  10  11  12
3  13  14  15  16

Desired output:

    a          b
    d   e   f  d   e   f
0   1   2   1  3   4   1
1   5   6   1  7   8  -1
2   9  10  -1  11  12 -1
3  13  14  -1  15  16 -1

Would like to be able to apply the following lines and assign them to a new column f. Two problems: First, the last line that contains the apply doesn't work but hopefully my intent is clear. Second, I'm unsure how to assign values to a new column of a dataframe with a multi index column structure. Would like to be able use functional programming methods.

lt = df.loc(axis=1)[:,'e'] < 8
gt = df.loc(axis=1)[:,'e'] >= 8
conditions = [lt, gt]
choices = [1, -1]
df.loc(axis=1)[:,'f'] = df.loc(axis=1)[:,'e'].apply(np.select(conditions, choices))

Upvotes: 2

Views: 140

Answers (1)

Onyambu
Onyambu

Reputation: 79318

nms = [(i, 'f')for i, j in df.columns if j == 'e']    
df[nms] = (df.iloc[:, [j == 'e' for i, j in df.columns]] < 8) * 2 - 1

df = df.sort_index(axis=1)
df
 a          b       
    d   e  f   d   e  f
0   1   2  1   3   4  1
1   5   6  1   7   8 -1
2   9  10 -1  11  12 -1
3  13  14 -1  15  16 -1

EDIT:

for a custom ordering:

d = {i:j for j, i in enumerate(df.columns.levels[0])}
df1 = df.loc[:, sorted(df.columns, key = lambda x: d[x[0]])]

IF the whole data is in a way symmetric, you could do:

df.stack(0).assign(f = lambda x: 2*(x.e < 8) - 1).stack().unstack([1,2])
Out[]: 
    a          b       
    d   e  f   d   e  f
0   1   2  1   3   4  1
1   5   6  1   7   8 -1
2   9  10 -1  11  12 -1
3  13  14 -1  15  16 -1

Upvotes: 4

Related Questions