Reputation: 885
I have two dataframes, the first one df1
contains only one row :
A B C D E
0 5 8 9 5 0
and the second one has multiple rows , but the same number of columns:
D C E A B
0 5 0 3 3 7
1 9 3 5 2 4
2 7 6 8 8 1
3 6 7 7 8 1
4 5 9 8 9 4
5 3 0 3 5 0
6 2 3 8 1 3
7 3 3 7 0 1
8 9 9 0 4 7
9 3 2 7 2 0
In real example I have much more columns (more than 100). the both dataframes has the same number of columns, and the same columns names, but the order of columns is different, as it's shown in the example.
I should multiply two dataframes (matrix_like multiplication), except of I couldn't perform simple df2.values * df1.values
because the columns are not ordered in the same manner, so for instance the second column of df1 B couldn't be multiplied at the second column of df2, because we find C instead of B at second column of df2 , while the column B is the 5th column in df2.
Is there simple and pythonic solution to multiply the dataframes, taking into account the column names ant not column index?
Upvotes: 1
Views: 1482
Reputation: 862641
You can use mul
, df1
is converted to Serie
by ix
:
print df1.ix[0]
A 5
B 8
C 9
D 5
E 0
Name: 0, dtype: int64
print df2.mul(df1.ix[0])
A B C D E
0 15 56 0 25 0
1 10 32 27 45 0
2 40 8 54 35 0
3 40 8 63 30 0
4 45 32 81 25 0
5 25 0 0 15 0
6 5 24 27 10 0
7 0 8 27 15 0
8 20 56 81 45 0
9 10 0 18 15 0
If you need change order of final DataFrame
, use with reindex_axis
:
print df2.mul(df1.ix[0]).reindex_axis(df2.columns.tolist(), axis=1)
D C E A B
0 25 0 0 15 56
1 45 27 0 10 32
2 35 54 0 40 8
3 30 63 0 40 8
4 25 81 0 45 32
5 15 0 0 25 0
6 10 27 0 5 24
7 15 27 0 0 8
8 45 81 0 20 56
9 15 18 0 10 0
Another solution is reorder columns by reindex
index
of Serie
by df2.columns
:
print df2.mul(df1.ix[0].reindex(df2.columns))
D C E A B
0 25 0 0 15 56
1 45 27 0 10 32
2 35 54 0 40 8
3 30 63 0 40 8
4 25 81 0 45 32
5 15 0 0 25 0
6 10 27 0 5 24
7 15 27 0 0 8
8 45 81 0 20 56
9 15 18 0 10 0
Upvotes: 2
Reputation:
df1[df2.columns]
returns a dataframe where the columns are ordered as in df2:
df1
Out[91]:
A B C D E
0 3 8 9 5 0
df1[df2.columns]
Out[92]:
D C E A B
0 5 9 0 3 8
So, you just need:
df2.values * df1[df2.columns].values
This will raise a key error if you have additional columns in df2; and it will only select df2's columns even if you have more columns in df1.
As @MaxU noted, since you are operating on numpy arrays, in order to go back to the dataframe structure you will need:
pd.DataFrame(df2.values * df1[df2.columns].values, columns = df2.columns)
Upvotes: 4