Reputation: 333
I have two dataframes df1
and df2
:
df1 = pd.DataFrame({"a" : [1,2,3,4,5,6,7,8,9,10,11,12],
"b" : [-1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12],
"t" : [1,2,3,4,5,6,7,8,9,10,11,12]})
df1.index = [1,1,1,2,2,2,3,3,3,4,4,5]
df2 = pd.DataFrame({"a" : [10,20,30],
"b" : [-10,-20,-30]})
df2.index = [2,3,4]
They look like this:
df1
a b t
1 1 -1 1
1 2 -2 2
1 3 -3 3
2 4 -4 4
2 5 -5 5
2 6 -6 6
3 7 -7 7
3 8 -8 8
3 9 -9 9
4 10 -10 10
4 11 -11 11
5 12 -12 12
df2
a b
2 10 -10
3 20 -20
4 30 -30
So the columns of df2
are a subset of the columns of df1
. I want to multiply the rows of df1
with the rows of df2
where they have the same index and only retain the rows that consist of the intersection between their indices, i.e. basically
ls_keep = []
for i in range(len(df1)):
for j in range(len(df2)):
if df1.index[i] == df2.index[j]:
df1.iloc[i]["a"] = df1.iloc[i]["a"] * df2.iloc[j]["a"]
df1.iloc[i]["b"] = df1.iloc[i]["b"] * df2.iloc[j]["b"]
ls_keep.append(i)
df1 = df1.iloc[ls_keep]
This gives me
a b t
2 40 40 4
2 50 50 5
2 60 60 6
3 140 140 7
3 160 160 8
3 180 180 9
4 300 300 10
4 330 330 11
This code works as intended, but is very unprofessional and long in case there are more than just two columns. Is there a way to optimize it using functions of pandas
?
Upvotes: 2
Views: 1857
Reputation: 862851
First filter only matched indices to new DataFrame
by boolean indexing
and isin
and then multiple by mul
with df2.columns
for processing same columns in both DataFrame
s:
df11 = df1[df1.index.isin(df2.index)].copy()
df11[df2.columns] = df11[df2.columns].mul(df2)
print (df11)
a b t
2 40 40 4
2 50 50 5
2 60 60 6
3 140 140 7
3 160 160 8
3 180 180 9
4 300 300 10
4 330 330 11
Upvotes: 3