Rei
Rei

Reputation: 329

Difference between 2 columns 2 dataframes based on index

I have 2 dataframes I would like to analyze.

df1:

          City         Time    Day
5866128   Los Angeles  3.5     01/09/2019
5172728   New York     14      09/09/2019
4787428   Boston       9       10/09/2019

df2:

          City         Time    Day
5866128   Los Angeles  3.5     01/09/2019
2478987   Denver       10      07/09/2019
5172728   New York     24      09/09/2019
4787428   Boston       4       10/09/2019
1478712   Austin       7       10/09/2019

I would like to create a third dataframe that would contain only the rows where df2['Time']-df1['Time'] != 0 based on the index and the rows that are only available in df2.

Desired output :

          City         Time     Day
2478987   Denver       10       07/09/2019
5172728   New York     10       09/09/2019
4787428   Boston       -5       10/09/2019
1478712   Austin        7       10/09/2019

I tried to play with numpy.where(), but I can't make it work to compare only the same index.

Thanks

Upvotes: 1

Views: 51

Answers (2)

Mark Wang
Mark Wang

Reputation: 2757

df2['Time'] = df2['Time'].sub(df1['Time'], fill_value=0)
df2[df2.Time.ne(0)]

or

df2.assign(Time = df2['Time'].sub(df1['Time'], fill_value= 0)).loc[lambda x:x.Time.ne(0)]

Upvotes: 2

Georgina Skibinski
Georgina Skibinski

Reputation: 13387

Try this one:

df3 = (df1.join(df2, rsuffix="_2") 
df3 = df3.loc[df["Time"]!=df["Time_2"]]
df3["Time"]=df3["Time_2"]-df3["Time"]
df3=df3.drop(["Time_2", "Day_2", "City_2"], axis=1)

Upvotes: 0

Related Questions