I want to add new column based on row condition which is based on two different columns of same dataframe. I have below Dataframe - df1_data = {'e_id': {0:'101',1:'',2:'103',3:'',4:'105',5:'',6:''}, 'r_id': {0:'',1:'502',2:'',3:'504',4:'',5:'506',6:''}} df=pd.DataFrame(df1_data) print df I want to add new column named as "sym". Condition - If 'e_id' column value is not null then sym column value is 'e_id' column value. If 'r_id' column value is not null then sym column value is 'r_id' column value. If 'e_id' and 'r_id' both column values are null then remove this particular row from pandas dataframe. I tried with below code - df1_data = {'e_id': {0:'101',1:'',2:'103',3:'',4:'105',5:''}, 'r_id': {0:'',1:'502',2:'',3:'504',4:'',5:'506'}} df=pd.DataFrame(df1_data) print df if df['e_id'].any(): df['sym'] = df['e_id'] print df if df['r_id'].any(): df['sym'] = df['r_id'] print df But it is giving me a wrong output. Expected output - e_id r_id sym 0 101 101 1 502 502 2 103 103 3 504 504 4 105 105 5 506 506

Reputation: 2904

How to add new column based on row condition in pandas dataframe?

I want to add new column based on row condition which is based on two different columns of same dataframe.

I have below Dataframe -

df1_data = {'e_id': {0:'101',1:'',2:'103',3:'',4:'105',5:'',6:''},
        'r_id': {0:'',1:'502',2:'',3:'504',4:'',5:'506',6:''}}
df=pd.DataFrame(df1_data)
print df

I want to add new column named as "sym".

Condition -

If 'e_id' column value is not null then sym column value is 'e_id' column value.
If 'r_id' column value is not null then sym column value is 'r_id' column value.
If 'e_id' and 'r_id' both column values are null then remove this particular row from pandas dataframe.

I tried with below code -

df1_data = {'e_id': {0:'101',1:'',2:'103',3:'',4:'105',5:''},
        'r_id': {0:'',1:'502',2:'',3:'504',4:'',5:'506'}}

df=pd.DataFrame(df1_data)
print df

if df['e_id'].any():
    df['sym'] = df['e_id']
print df

if df['r_id'].any():
    df['sym'] = df['r_id']
print df

But it is giving me a wrong output.

Expected output -

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

Upvotes: 1

Answers (3)

Adriel M. Vieira

Reputation: 121

You can start with column 'e_id' and replace its values with 'r_id' values whenever 'e_id' is "empty", using pandas.DataFrame.mask and the 'other' parameter:

df['sym'] = df['e_id'].mask(df['e_id'] == '', other=df['r_id'], axis=0)

then you just need to remove rows where sym is "empty"

df = df[df.sym!='']

Upvotes: 0

piRSquared

Reputation: 294546

pandas
Using mask + fillna + assign

d1 = df.mask(df == '')
df.assign(sym=d1.e_id.fillna(d1.r_id)).dropna(subset=['sym'])

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

How It Works

I need to mask your '' values with the assumption that you meant those to be null
By using fillna I take e_id if it's not null otherwise take r_id if it's not null
dropna with subset=['sym'] only drops the row if the new column is null and that is only null if both e_id and r_id were null

numpy
Using np.where + assign

e = df.e_id.values
r = df.r_id.values
df.assign(
    sym=np.where(
        e != '', e,
        np.where(r != '', r, np.nan)
    )
).dropna(subset=['sym'])

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

numpy v2
Reconstruct the dataframe from values

v = df.values
m = (v != '').any(1)
v = v[m]
c1 = v[:, 0]
c2 = v[:, 1]
pd.DataFrame(
    np.column_stack([v, np.where(c1 != '', c1, c2)]),
    df.index[m], df.columns.tolist() + ['sym']
)

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

Timing

%%timeit
e = df.e_id.values
r = df.r_id.values
df.assign(sym=np.where(e != '', e, np.where(r != '', r, np.nan))).dropna(subset=['sym'])
1000 loops, best of 3: 1.23 ms per loop

%%timeit
d1 = df.mask(df == '')
df.assign(sym=d1.e_id.fillna(d1.r_id)).dropna(subset=['sym'])
100 loops, best of 3: 2.44 ms per loop

%%timeit
v = df.values
m = (v != '').any(1)
v = v[m]
c1 = v[:, 0]
c2 = v[:, 1]
pd.DataFrame(
    np.column_stack([v, np.where(c1 != '', c1, c2)]),
    df.index[m], df.columns.tolist() + ['sym']
)
1000 loops, best of 3: 204 µs per loop

Upvotes: 2

jezrael

Reputation: 863741

First filter both empty columns by boolean indexing with any:

df = df[(df != '').any(1)]
#alternatively
#df = df[(df['e_id'] != '') | (df['r_id'] != '')]

Then use mask with combine_first:

df['sym'] = df['e_id'].mask(df['e_id'] == '').combine_first(df['r_id'])
print (df)

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

Numpy solution with filtering and numpy.where:

df = df[(df['e_id'] != '') | (df['r_id'] != '')]
e_id = df.e_id.values
r_id = df.r_id.values
df['sym'] = np.where(e_id != '', e_id, r_id)
print (df)
  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

Upvotes: 2

How to add new column based on row condition in pandas dataframe?

Answers (3)

Related Questions