ipj
ipj

Reputation: 3598

Select one column of data frame per row using value stored in another column and modify it

Given an example dataframe df:

df = (pd.DataFrame.from_dict({'a':[np.nan, 10, 20], 
                              'b':[np.nan, np.nan, 20], 
                              'c':[np.nan, 30, np.nan],
                              'c':[40, np.nan, np.nan],
                              'col':['b','c','a']})
)

      a     b     c col
0   NaN   NaN  40.0   b
1  10.0   NaN   NaN   c
2  20.0  20.0   NaN   a

I want to fillna(0) at each row of df only selected column according to value stored in df.col.

The output should be:

      a     b     c col
0   NaN   0.0  40.0   b
1  10.0   NaN   0.0   c
2  20.0  20.0   NaN   a    

I've found an iterative solution:

for index, row in df.iterrows():
    df.loc[index, row.col] = np.nanmax([df.loc[index, row.col],0])

but it is slow, and hard to use is chaining operations (assign method).

Is there another, maybe vectorized way or apply version of getting desired result?

Upvotes: 1

Views: 99

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195553

You can use .transform() with np.where:

print(
    df.transform(
        lambda x: np.where(x.isna() & (x.index == x["col"]), 0, x),
        axis=1,
    )
)

Prints:

     a    b    c col
0  NaN    0   40   b
1   10  NaN    0   c
2   20   20  NaN   a

Upvotes: 2

Erfan
Erfan

Reputation: 42926

Method 1: using stack and unstack

df = df.set_index('col').stack(dropna=False)
m1 = df.index.get_level_values(0) == df.index.get_level_values(1)
m2 = df.isna()

df.loc[m1 & m2] = 0
df = df.unstack(level=1).reset_index()

Method 2: melt and pivot

We can melt the dataframe, then find the rows which we need to fill the NaN. Finally pivot our dataframe back in the format we want:

m = df.melt(id_vars='col')
m['value'] = np.where(
    m['col'] == m['variable'], 
    m['value'].fillna(0), 
    m['value']
)

df = m.pivot_table(
    index='col', 
    columns='variable', 
    values='value'
).reindex(df['col'])
df = df.reset_index().rename_axis(columns=None)

Output

  col     a     b     c
0   b   NaN  0.00 40.00
1   c 10.00   NaN  0.00
2   a 20.00 20.00   NaN

Upvotes: 2

Related Questions