johnbaltis
johnbaltis

Reputation: 1578

pandas, correctly handle numpy arrays inside a row element

I'll give a minimal example where I would create numpy arrays inside row elements of a pandas.DataFrame.

TL;DR: see the screenshot of the DataFrame

This code finds the minimum of a certain function, by using scipy.optimize.brute, which returns the minimum, variable at which the minimum is found and two numpy arrays at which it evaluated the function.

import numpy
import scipy.optimize
import itertools

sin = lambda r, phi, x: r * np.sin(phi * x)

def func(r, x):
    x0, fval, grid, Jout = scipy.optimize.brute(
        sin, ranges=[(-np.pi, np.pi)], args=(r, x), Ns=10, full_output=True)
    return dict(phi_at_min=x0[0], result_min=fval, phis=grid, result_at_grid=Jout)


rs = numpy.linspace(-1, 1, 10)
xs = numpy.linspace(0, 1, 10)

vals = list(itertools.product(rs, xs))

result = [func(r, x) for r, x in vals]

# idk whether this is the best way of generating the DataFrame, but it works
df = pd.DataFrame(vals, columns=['r', 'x'])
df = pd.concat((pd.DataFrame(result), df), axis=1)
df.head()

dataframe

I expect that this is not how I am supposed to do this and should maybe expand the lists somehow. How do I handle this in a correct, beautiful, and clean way?

Upvotes: 1

Views: 919

Answers (1)

Julien Marrec
Julien Marrec

Reputation: 11905

So, even though "beautiful and clean" is subject to interpretation, I'll give you mine, which should give you in turn some ideas. I'm leveraging a multiindex so that you can later easily select pairs of phi/result_at_grid for each point in the evaluation grid. I'm also using applyinstead of creating two dataframes.

import numpy
import scipy.optimize
import itertools

sin = lambda r, phi, x: r * np.sin(phi * x)

def func(row):
    """
    Accepts a row of a dataframe (a pd.Series).
    df.apply(func, axis=1)
    returns a pd.Series with the initial (r,x) and the results
    """
    r = row['r']
    x = row['x']
    x0, fval, grid, Jout = scipy.optimize.brute(
        sin, ranges=[(-np.pi, np.pi)], args=(r, x), Ns=10, full_output=True)

    # Create a multi index series for the phis
    phis = pd.Series(grid)
    phis.index = pd.MultiIndex.from_product([['Phis'], phis.index])

    # same for result at grid
    result_at_grid = pd.Series(Jout)
    result_at_grid.index = pd.MultiIndex.from_product([['result_at_grid'], result_at_grid.index])

    # concat
    s = pd.concat([phis, result_at_grid])

    # Add these two float results
    s['phi_at_min'] = x0[0]
    s['result_min'] = fval

    # add the initial r,x to reconstruct the index later
    s['r'] = r
    s['x'] = x

    return s



rs = numpy.linspace(-1, 1, 10)
xs = numpy.linspace(0, 1, 10)

vals = list(itertools.product(rs, xs))
df = pd.DataFrame(vals, columns=['r', 'x'])

# Apply func to each row (axis=1)
results = df.apply(func, axis=1)
results.set_index(['r','x'], inplace=True)
results.head().T # Transposing so we can see the output in one go...

enter image description here

Now you can select all values at the evaluation grid point 2 for example

print(results.swaplevel(0,1, axis=1)[2].head()) # Showing only 5 first


                   Phis  result_at_grid
r    x                                 
-1.0 0.000000 -1.745329        0.000000
     0.111111 -1.745329        0.193527
     0.222222 -1.745329        0.384667
     0.333333 -1.745329        0.571062
     0.444444 -1.745329        0.750415

Upvotes: 1

Related Questions