Raj Hassani
Raj Hassani

Reputation: 1667

Conditionally concat a dataframe in python using pandas

I have a data frame

df

    A  B
0  test1  1
1  test2  4
2  test3  1
3  test4  2

df1

   C
0  test3
1  test5

I want to conditionally merge them to a new dataframe

df2

   A   B
0  test1   1
1  test2   4
2  test3   0
3  test4   2
4  test5   0

A new data frame where if the value in column A is equal to column C, then while merging just update its column B value to a default of 0 and if there there isn't a value that exists in column A which is equal to a value in column C just add it to the data frame as shown above with a default value of 0.

Upvotes: 1

Views: 2962

Answers (3)

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

Here is a simple approach. You only take element from the second dataframe in col C which are not in col A on the first dataframe - and concatenate by setting missing values to 0. Finally you use a small hack in groupby in case there are several same values in col A, to select the one with 0:

pd.concat([df,df1.rename(columns={'C':'A'})]).fillna(0).groupby('A', as_index=False).last()

       A  B
0  test1  1
1  test2  4
2  test3  0
3  test4  2
4  test5  0

Upvotes: 1

Zero
Zero

Reputation: 76917

You could outer merge on A and C columns

In [31]: tmp = df.merge(df1, left_on='A', right_on='C', how='outer')

Replace column A nan values from column C

In [32]: tmp['A'].fillna(value=tmp['C'], inplace=True)

then replace B values where C values are not null.

In [33]: tmp.ix[tmp['C'].notnull(), 'B'] = 0

In [34]: tmp[['A', 'B']]
Out[34]:
A   B
0   test1   1
1   test2   4
2   test3   0
3   test4   2
4   test5   0

Upvotes: 1

Liam Foley
Liam Foley

Reputation: 7822

df2 = pd.merge(df,df1,left_on='A',right_on='C',how="outer")
df2['A'].fillna(value=df2['C'],inplace=True)
df2.loc[df2['A'] == df2['C'],'B'] = 0
df2 = df2[['A','B']]

I like the option below better, but it isn't exactly what you asked for.

df1['B'] = 0
df1.columns = ['A','B']
df2 = pd.concat([df,df1])
df2.drop_duplicates('A',take_last=True,inplace=True)
df2 = df2.sort('A').reset_index(drop=True)

Upvotes: 1

Related Questions