LostinSpatialAnalysis
LostinSpatialAnalysis

Reputation: 641

How to apply function to all rows in data frame?

I am confused about how to apply a function to a data frame. Generally with creating user-defined-functions, I am familiar with ultimately having a "return" value to produce. Except for this case, I need the "return" value to show up in every cell of a data frame column, and I can't figure this out. The function is based on "if" and "if else" conditional statements, and I am unsure how to apply this to my data frame. Maybe I am perhaps missing a parentheses or bracket somewhere, but I am not entirely sure. I will explain here below.

I have the following dataframe:

       Day      No_employee?       No_machinery?      Production_potential
---------------------------------------------------------------------------
0    Day 1                 1                   0                         5      
1    Day 2                 1                   1                         4
2    Day 3                 0                   1                         3
3    Day 4                 1                   0                         8
4    Day 5                 0                   0                         6
5    Day 6                 0                   1                         3
6    Day 7                 0                   0                         5
7    Day 8                 1                   1                         2
...

Now I want to take my dataframe and append a new column called Production_lost, based on the following logic:

In a factory, to manufacture products, you need both 1) an employee present, and 2) functioning machinery. If you cannot produce any product, then that potential product becomes lost product.

For each day (thinking about a factory), if No_employee? is true ( = 1), then no products can be made, regardless of No_machinery? and Production_lost = Production_potential. If No_machinery? is true ( = 1), then no products can be made, regardless of No_employee?, and Production_lost = Production_potential. Only if No_employee? and No_machinery? both = 0, will Production_lost = 0. If you have both an employee present and functioning machinery, there will be no production loss.

So I have the following code:

df['Production_loss'] = df['No_employee?'].apply(lambda x: df['Production_potential'] if x == 1.0 else df['Production_potential'] * df['No_machinery?'])

which produces the following error message:

ValueError: Wrong number of items passed 70, placement implies 1

I understand this means that there are too many arguments being applied to a single column (I think), but I am not sure how to address this, or how I might have reached this problem. Is there a simple fix to this?

The dataframe I am trying to produce would look like this:

       Day      No_employee?       No_machinery?      Production_potential     Production_lost
-----------------------------------------------------------------------------------------------
0    Day 1                 1                   0                         5                   5
1    Day 2                 1                   1                         4                   4
2    Day 3                 0                   1                         3                   3  
3    Day 4                 1                   0                         8                   8
4    Day 5                 0                   0                         6                   0
5    Day 6                 0                   1                         3                   3
6    Day 7                 0                   0                         5                   0
7    Day 8                 1                   1                         2                   2
...

Upvotes: 0

Views: 107

Answers (2)

mozway
mozway

Reputation: 260400

No need to use apply, use pd.Series.where instead:

df['Production_loss'] = df['Production_potential'].where(df['No_employee?'].eq(1), df['Production_potential'] * df['No_machinery?'])

You can also use multiplication:

df['Production_loss'] = ~(df['No_employee?'] * df['No_machinery?']) * df['Production_potential']

Upvotes: 1

It_is_Chris
It_is_Chris

Reputation: 14073

numpy.where

df['Production_lost'] = np.where(((df['No_employee?'] == 1) | (df['No_machinery?'] == 1)),
                                 df['Production_potential'], 0)

     Day  No_employee?  No_machinery?  Production_potential  Production_lost
0  Day 1             1              0                     5                5
1  Day 2             1              1                     4                4
2  Day 3             0              1                     3                3
3  Day 4             1              0                     8                8
4  Day 5             0              0                     6                0
5  Day 6             0              1                     3                3
6  Day 7             0              0                     5                0
7  Day 8             1              1                     2                2

Upvotes: 1

Related Questions