Reputation: 29
I've been given a dataframe df
and a function foo(int x)
that returns a Series. I'd like to join df
with the result of foo()
for each row using a vectorized operation.
For example, given the following dataframe, I call foo()
on the values of column col_1
col_1 | col_2 | col_3 |
---|---|---|
1 | 1 | 'a' |
12 | 2 | 'b' |
13 | 3 | 'd' |
4 | 4 | 'c' |
If we assume
foo(1) = Series('col4': 0, 'col5': 2)
foo(12) = Series('col4': 1, 'col5': 3)
foo(13) = Series('col4': 1, 'col5': 4)
foo(4) = Series('col4': 0, 'col5': 5)
then the output should be
col_1 | col_2 | col_3 | col4 | col5 |
---|---|---|---|---|
1 | 1 | 'a' | 0 | 2 |
12 | 2 | 'b' | 1 | 3 |
13 | 3 | 'd' | 1 | 4 |
4 | 4 | 'c' | 0 | 5 |
Upvotes: 1
Views: 133
Reputation: 14847
EDIT: Looks like .from_records
will handle the map cleanly. You could try that with the pd.concat
instead:
In [118]: pd.DataFrame.from_records(df['col_1'].map(foo))
Out[118]:
col4 col5
0 0 2
1 1 3
2 1 4
3 0 5
Normally, I'd use .map()
for something like this since it's generally faster than .apply()
, but the output comes out a little funky, so unless you've got a giant dataframe, I'd just use the straightforward .apply()
option with pd.concat
:
In [18]: def foo(n):
...: return {1: pd.Series({'col4': 0, 'col5': 2}), 12: pd.Series({'col4': 1, 'col5': 3}), 13: pd.Series({'col4': 1, 'col5': 4}), 4: pd.Series
...: ({'col4': 0, 'col5': 5})}[n]
...:
In [19]: df
Out[19]:
col_1 col_2 col_3
0 1 1 'a'
1 12 2 'b'
2 13 3 'd'
3 4 4 'c'
In [20]: pd.concat([df, df['col_1'].apply(foo)], axis=1)
Out[20]:
col_1 col_2 col_3 col4 col5
0 1 1 'a' 0 2
1 12 2 'b' 1 3
2 13 3 'd' 1 4
3 4 4 'c' 0 5
Another option you might try is to have the function return a dictionary rather than a Series
Upvotes: 2
Reputation: 376
Since your function can't take anything but a scalar, you're restricted to using Series.apply
. In fact, it's almost as though your function was made exactly for this use case ...
Since if the func returns a Series object, the final output will be a DataFrame that is easy to join to your original one. From here you use pd.concat
along the proper axis
pd.concat([df, df.iloc[:,0].apply(foo)], axis=1)
Upvotes: 0