Ranny
Ranny

Reputation: 315

Creating a new column based on condition of an index

I have df:

arrays = [np.array(['bar', 'bar', 'bar', 'bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz', 'baz', 'baz', 'foo', 'foo', 'foo', 'foo', 'foo', 'foo']),
       np.array(['one','two'] * 9),
        np.array([1,2,3] * 6)]
df = pd.DataFrame(np.random.randn(18,2), index=arrays)


                      col0      col1
bar one     1   0.872359    -1.115871
    two     2   -0.937908   -0.528563
    one     3   -0.118874   0.286595
    two     1   -0.507698   1.364643
    one     2   1.507611    1.379498
    two     3   -1.398019   -1.603056
baz one     1   1.498263    0.412380
    two     2   -0.930022   -1.483657
    one     3   -0.438157   1.465089
    two     1   0.161887    1.346587
    one     2   0.167086    1.246322
    two     3   0.276344    -1.206415
foo one     1   -0.045389   -0.759927
    two     2   0.087999    -0.435753
    one     3   -0.232054   -2.221466
    two     1   -1.299483   1.697065
    one     2   0.612211    -1.076738
    two     3   -1.482573   0.907826

And now I want to create 'NEW' column that:

for 'bar'
  if index.level(2) > 1
     "NEW" = col1
  else
     "NEW" = col2

for 'baz' the same with >2

for 'foo' the same with >3

How to do it without Py loops?

Upvotes: 2

Views: 64

Answers (2)

jezrael
jezrael

Reputation: 862441

You can use get_level_values for select index values by levels and then for new column numpy.where:

#if possible use dictionary
d = {'bar':1, 'baz':2, 'foo':3}
m = df.index.get_level_values(2)  > df.rename(d).index.get_level_values(0)

df['NEW'] = np.where(m, df.col1, df.col2)

For more general solution useSeries.rank:

a = df.index.get_level_values(2) 
b = df.index.get_level_values(0).to_series().rank(method='dense')

df['NEW'] = np.where(a > b, df.col1, df.col2)

Detail:

print (b)
bar    1.0
bar    1.0
bar    1.0
bar    1.0
bar    1.0
bar    1.0
baz    2.0
baz    2.0
baz    2.0
baz    2.0
baz    2.0
baz    2.0
foo    3.0
foo    3.0
foo    3.0
foo    3.0
foo    3.0
foo    3.0
dtype: float64

Upvotes: 2

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95873

I think you can avoid looping over df.level(0) by using df.factorize:

In [40]: thresh = pd.factorize(df.index.get_level_values(0).values)[0] + 1

In [41]: mask = df.index.get_level_values(2) > thresh

In [42]: df['NEW'] = np.where(mask, df.col1, df.col2)

In [43]: df
Out[43]:
               col1      col2       NEW
bar one 1  0.247222 -0.270104 -0.270104
    two 2  0.429196 -1.385352  0.429196
    one 3  0.782293 -1.565623  0.782293
    two 1  0.392214  1.023960  1.023960
    one 2 -1.628410 -0.484275 -1.628410
    two 3  0.256757  0.529373  0.256757
baz one 1 -0.568608 -0.776577 -0.776577
    two 2  2.142408 -0.815413 -0.815413
    one 3  0.860080  0.501965  0.860080
    two 1 -0.267029 -0.025360 -0.025360
    one 2  0.187145 -0.063436 -0.063436
    two 3  0.351296 -2.050649  0.351296
foo one 1  0.704941  0.176698  0.176698
    two 2 -0.380353  1.027745  1.027745
    one 3 -1.337364 -0.568359 -0.568359
    two 1 -0.588601 -0.800426 -0.800426
    one 2  1.513358 -0.616237 -0.616237
    two 3  0.244831  1.027109  1.027109

Upvotes: 1

Related Questions