shah
shah

Reputation: 29

How to vectorize rowwise join

I've been given a dataframe df and a function foo(int x) that returns a Series. I'd like to join df with the result of foo() for each row using a vectorized operation.

For example, given the following dataframe, I call foo() on the values of column col_1

col_1 col_2 col_3
1 1 'a'
12 2 'b'
13 3 'd'
4 4 'c'

If we assume

then the output should be

col_1 col_2 col_3 col4 col5
1 1 'a' 0 2
12 2 'b' 1 3
13 3 'd' 1 4
4 4 'c' 0 5

Upvotes: 1

Views: 133

Answers (2)

Randy
Randy

Reputation: 14847

EDIT: Looks like .from_records will handle the map cleanly. You could try that with the pd.concat instead:

In [118]: pd.DataFrame.from_records(df['col_1'].map(foo))
Out[118]:
   col4  col5
0     0     2
1     1     3
2     1     4
3     0     5

Normally, I'd use .map() for something like this since it's generally faster than .apply(), but the output comes out a little funky, so unless you've got a giant dataframe, I'd just use the straightforward .apply() option with pd.concat:

In [18]: def foo(n):
    ...:     return {1: pd.Series({'col4': 0, 'col5': 2}), 12: pd.Series({'col4': 1, 'col5': 3}), 13: pd.Series({'col4': 1, 'col5': 4}), 4: pd.Series
    ...: ({'col4': 0, 'col5': 5})}[n]
    ...:

In [19]: df
Out[19]:
   col_1  col_2 col_3
0      1      1   'a'
1     12      2   'b'
2     13      3   'd'
3      4      4   'c'

In [20]: pd.concat([df, df['col_1'].apply(foo)], axis=1)
Out[20]:
   col_1  col_2 col_3  col4  col5
0      1      1   'a'     0     2
1     12      2   'b'     1     3
2     13      3   'd'     1     4
3      4      4   'c'     0     5

Another option you might try is to have the function return a dictionary rather than a Series

Upvotes: 2

Ian Graham
Ian Graham

Reputation: 376

Since your function can't take anything but a scalar, you're restricted to using Series.apply. In fact, it's almost as though your function was made exactly for this use case ...

Since if the func returns a Series object, the final output will be a DataFrame that is easy to join to your original one. From here you use pd.concat along the proper axis

pd.concat([df, df.iloc[:,0].apply(foo)], axis=1)

Upvotes: 0

Related Questions