Sun
Sun

Reputation: 1955

populate new column in a pandas dataframe which takes input from other columns

i have a function which should take x , y , z as input and returns r as output. For example : my_func( x , y, z) takes x = 10 , y = 'apple' and z = 2 and returns value in column r. Similarly, function takes x = 20, y = 'orange' and z =4 and populates values in column r. Any suggestions what would be the efficient code for this ?

Before :

   a  x       y       z      
   5  10   'apple'    2
   2  20   'orange'   4
   0  4    'apple'    2
   5  5    'pear'     6

After:

   a  x       y       z      r
   5  10   'apple'    2      x
   2  20   'orange'   4      x
   10  4   'apple'    2      x
   5  5    'pear'     6      x

Upvotes: 2

Views: 1365

Answers (1)

roman
roman

Reputation: 117540

Depends on how complex your function is. In general you can use pandas.DataFrame.apply:

>>> def my_func(x):
...     return '{0} - {1} - {2}'.format(x['y'],x['a'],x['x'])
... 
>>> df['r'] = df.apply(my_func, axis=1)
>>> df
   a   x         y  z                  r
0  5  10   'apple'  2   'apple' - 5 - 10
1  2  20  'orange'  4  'orange' - 2 - 20
2  0   4   'apple'  2    'apple' - 0 - 4
3  5   5    'pear'  6     'pear' - 5 - 5

axis=1 is to make your function work 'for each row' instead of 'for each column`:

Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1)

But if it's really simple function, like the one above, you probably can even do it without function, with vectorized operations.

Upvotes: 3

Related Questions