A pandas dataframe column to pass as an argument of row level function to apply each value of the column to its respective row

Question

![enter image description here][1]

I have a weird requirement to add a column similar to the below oracle query to apply on python dataframe. A pandas dataframe column to pass as an argument of row level function to apply each value of the column to its respective row.

select employee_id, first_name, last_name, substr(first_name, employee_id/employee_id, 3) test1 from hr.EMPLOYEES

The result has been attached in the below csv file. I want to add a new column. So I wrote my pandas statement as below

EMPLOYEE.assign(test1=EMPLOYEE.FIRST_NAME.str.slice(EMPLOYEE.EMPLOYEE_ID.apply(lambda x: x/x),4))

In this code I am using EMPLOYEE.EMPLOYEE_ID inside the slice function of EMPLOYEE.FIRST_NAME column.

my intention is if we think this as a variable

FIRST_NAME="Steven"
EMPLOYEE_ID=100
FIRST_NAME[int(EMPLOYEE_ID/EMPLOYEE_ID):4]

the output of this is 'tev'

If I apply the same concept on dataframe it is not working.

I am not getting the substring, instead I am getting NaN. My Python output is as below

https://i.sstatic.net/uZN8x.png

The below is the output from SQL query.

    EMPLOYEE_ID   FIRST_NAME  LAST_NAME TEST1
0           100       Steven       King   Ste
1           101        Neena    Kochhar   Nee
2           102          Lex    De Haan   Lex
3           103    Alexander     Hunold   Ale
4           104        Bruce      Ernst   Bru
5           105        David     Austin   Dav
6           106        Valli  Pataballa   Val
7           107        Diana    Lorentz   Dia
8           108        Nancy  Greenberg   Nan
9           109       Daniel     Faviet   Dan
10          110         John       Chen   Joh
11          111       Ismael    Sciarra   Ism
12          112  Jose Manuel      Urman   Jos
13          113         Luis       Popp   Lui

We can do it by creating a new column, but, as in SQL we can do it directly. I am expecting the same through pandas. Kindly help me on this.

My intended output is the same as above. But, I am writing a generic script that converts SQL query into pandas dataframe. I am stuck in achieving such scenarios where there is a column is being passed as a parameter of a row level function.

Sreenath Reddy · Accepted Answer

df[['EMPLOYEE_ID', 'FIRST_NAME', 'LAST_NAME']].apply(lambda i: i.FIRST_NAME[int(i.EMPLOYEE_ID/i.EMPLOYEE_ID):4], axis=1)

The above is giving me the answer. But it is slower than the other one

%timeit EMPLOYEE.apply(lambda i: i.FIRST_NAME[int(i.EMPLOYEE_ID/i.EMPLOYEE_ID):3], axis=1)

1.38 ms ± 58.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit EMPLOYEE[['EMPLOYEE_ID', 'FIRST_NAME', 'LAST_NAME']].assign(pre=EMPLOYEE['FIRST_NAME'].str[1:4])

948 µs ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

A pandas dataframe column to pass as an argument of row level function to apply each value of the column to its respective row

If I apply the same concept on dataframe it is not working.

Answers (2)

Related Questions