Reputation: 33

using pd.DataFrame.apply to create multiple columns

My first question here!

I'm having some trouble figuring out what I'm doing wrong here, trying to append columns to an existing pd.DataFrame object. Specifically, my original dataframe has n-many columns, and I want to use apply to append an additional 2n-many columns to it. The problem seems to be that doing this via apply() doesn't work, in that if I try to append more than n-many columns, it falls over. This doesn't make sense to me, and I was hoping somebody could either shed some light on to why I'm seeing this behaviour, or suggest a better approach.

For example,

df = pd.DataFrame(np.random.rand(10,2))

def this_works(x):
    return 5 * x

def this_fails(x):
    return np.append(5 * x, 5 * x)

df.apply(this_works, 1)  # Two columns of output, as expected
df.apply(this_fails, 1)  # Unexpected failure...

Any ideas? I know there are other ways to create the data columns, this approach just seemed very natural to me and I'm quite confused by the behaviour.

SOLVED! CT Zhu's solution below takes care of this, my error arises from not properly returning a pd.Series object in the above.

Upvotes: 2

Answers (2)

CT Zhu

Reputation: 54340

Are you trying to do a few different calculations on your df and put the resulting vectors together in one larger DataFrame, like in this example?:

In [39]:

print df

          0         1
0  0.718003  0.241216
1  0.580015  0.981128
2  0.477645  0.463892
3  0.948728  0.653823
4  0.056659  0.366104
5  0.273700  0.062131
6  0.151237  0.479318
7  0.425353  0.076771
8  0.317731  0.029182
9  0.543537  0.589783

In [40]:

print df.apply(lambda x: pd.Series(np.hstack((x*5, x*6))), axis=1)

          0         1         2         3
0  3.590014  1.206081  4.308017  1.447297
1  2.900074  4.905639  3.480088  5.886767
2  2.388223  2.319461  2.865867  2.783353
3  4.743640  3.269114  5.692369  3.922937
4  0.283293  1.830520  0.339951  2.196624
5  1.368502  0.310656  1.642203  0.372787
6  0.756187  2.396592  0.907424  2.875910
7  2.126764  0.383853  2.552117  0.460624
8  1.588656  0.145909  1.906387  0.175091
9  2.717685  2.948917  3.261222  3.538701

Upvotes: 1

Andy Hayden

Reputation: 375485

FYI in this trivial case you can do 5 * df !

I think the issue here is that np.append flattens the Series:

In [11]: np.append(df[0], df[0])
Out[11]:
array([ 0.33145275,  0.14964056,  0.86268119,  0.17311983,  0.29618537,
        0.48831228,  0.64937305,  0.03353709,  0.42883925,  0.99592229,
        0.33145275,  0.14964056,  0.86268119,  0.17311983,  0.29618537,
        0.48831228,  0.64937305,  0.03353709,  0.42883925,  0.99592229])

what you want is it to create four columns (isn't it?). The axis=1 means that you are doing this row-wise (i.e. x is the row which is a Series)...

In general you want apply to return either:

a single value, or
a Series (with unique index).

Saying that I kinda thought the following may work (to get four columns):

In [21]: df.apply((lambda x: pd.concat([x[0] * 5, x[0] * 5], axis=1)), axis=1)
TypeError: ('cannot concatenate a non-NDFrame object', u'occurred at index 0')

In [22]: df.apply(lambda x: np.array([1, 2, 3, 4]), axis=1)
ValueError: Shape of passed values is (10,), indices imply (10, 2)

In [23]: df.apply(lambda x: pd.Series([1, 2, 3, 4]), axis=1)  # works

Maybe I expected the first to raise about non-unique index... but I was surprised that the second failed.

Upvotes: 0

using pd.DataFrame.apply to create multiple columns

Answers (2)

Related Questions