Varun
Varun

Reputation: 135

Pandas iterating over two dataframes

I have two Dataframes first one has my main data. I am taking one of the columns from the first dataframe and making another dataframe with string split.

_t1 = df1["TABLE_NAME"].str.split("_",expand = True)
_t1.head()

first value from the df1 is T_STG_PRG_POS_NORM_FAREASTand second is T_STG_PRG_POS_NORM_EXEC_DBIT

_t1 looks like:

+---+-----+-----+-----+------+---------+------+------+
| 0 |  1  |  2  |  3  |  4   |    5    |  6   |  7   |
+---+-----+-----+-----+------+---------+------+------+
| T | STG | PRG | POS | NORM | FAREAST | None |      |
| T | STG | PRG | POS | NORM | EXCEC   | DBIT | None |
+---+-----+-----+-----+------+---------+------+------+

now I want to create a column df1["SYSTEM NAME"] which should have values FAREAST for row 0 and DBIT

I am trying this loop:

for index,row in df1.iterrows():
    for column in _t1:
        if (pd.isna(_t1[column][row])== True):
            df1["SYSTEM NAME"]= _t1[column-1][row]

But I am getting an error: ValueError: cannot index with vector containing NA / NaN values

Upvotes: 1

Views: 97

Answers (2)

Valentino
Valentino

Reputation: 7361

As for @rpanai's answer, you don't need _t1.

But in case you want for some reason use _t1, here is how to do:

df1['SYSTEM_NAME'] = _t1.apply(lambda x : x[x.notna()].iloc[-1], axis=1)

The lambda function in the apply method gets the last non NA element from each row.
None and np.Nan values are both considered NA values by notna.

Upvotes: 2

rpanai
rpanai

Reputation: 13437

I don't understand why you need t1 as you are looking for the last element of your split. The following should work and it's vectorial.

import pandas as pd

df = pd.DataFrame({"TABLE_NAME":["T_STG_PRG_POS_NORM_FAREAST",
                                 "T_STG_PRG_POS_NORM_EXEC_DBIT"]})

df["SYSTEM_NAME"] = df["TABLE_NAME"].str.split("_").str[-1]

Upvotes: 2

Related Questions