Apply Numpy function over entire Dataframe

Question

I am applying this function over a dataframe df1 such as the following:

                          AA          AB             AC           AD  
2005-01-02 23:55:00      "EQUITY"    "EQUITY"      "EQUITY"     "EQUITY"   
2005-01-03 00:00:00        32.32      19.5299        32.32      31.0455   
2005-01-04 00:00:00      31.9075      19.4487      31.9075      30.3755   
2005-01-05 00:00:00      31.6151      19.5799      31.6151       29.971   
2005-01-06 00:00:00      31.1426      19.7174      31.1426      29.9647  

def func(x):
    for index, price in x.iteritems():
      x[index] = price / np.sum(x,axis=1)
    return x[index]

df3=func(df1.ix[1:])

However, I only get a single column returned as opposed to 3

    2005-01-03    0.955843
    2005-01-04    0.955233
    2005-01-05    0.955098
    2005-01-06    0.955773
    2005-01-07    0.955877
    2005-01-10     0.95606
    2005-01-11     0.95578
    2005-01-12    0.955621

I am guessing I am missing something in the formula to make it apply to the entire dataframe. Also how could I return the first index that has strings in its row?

Ujjwal · Accepted Answer

You need to do it the following way :

def func(row):
    return row/np.sum(row)
df2 = pd.concat([df[:1], df[1:].apply(func, axis=1)], axis=0)

It has 2 steps :

df[:1] extracts the first row, which contains strings, while df[1:] represents the rest of the DataFrame. You concatenate them later on, which answers the second part of your question.
For operating over rows you should use apply() method.

Apply Numpy function over entire Dataframe

Answers (1)

Related Questions