Multiply dataframes with differnet lengths regarding columns names

Question

I have two dataframes, the first one df1 contains only one row :

   A  B  C  D  E
0  5  8  9  5  0

and the second one has multiple rows , but the same number of columns:

   D  C  E  A  B
0  5  0  3  3  7
1  9  3  5  2  4
2  7  6  8  8  1
3  6  7  7  8  1
4  5  9  8  9  4
5  3  0  3  5  0
6  2  3  8  1  3
7  3  3  7  0  1
8  9  9  0  4  7
9  3  2  7  2  0

In real example I have much more columns (more than 100). the both dataframes has the same number of columns, and the same columns names, but the order of columns is different, as it's shown in the example. I should multiply two dataframes (matrix_like multiplication), except of I couldn't perform simple df2.values * df1.values because the columns are not ordered in the same manner, so for instance the second column of df1 B couldn't be multiplied at the second column of df2, because we find C instead of B at second column of df2 , while the column B is the 5th column in df2.

Is there simple and pythonic solution to multiply the dataframes, taking into account the column names ant not column index?

user2285236 · Accepted Answer

df1[df2.columns] returns a dataframe where the columns are ordered as in df2:

df1
Out[91]: 
   A  B  C  D  E
0  3  8  9  5  0

df1[df2.columns]
Out[92]: 
   D  C  E  A  B
0  5  9  0  3  8

So, you just need:

df2.values * df1[df2.columns].values

This will raise a key error if you have additional columns in df2; and it will only select df2's columns even if you have more columns in df1.

As @MaxU noted, since you are operating on numpy arrays, in order to go back to the dataframe structure you will need:

pd.DataFrame(df2.values * df1[df2.columns].values, columns = df2.columns)

Multiply dataframes with differnet lengths regarding columns names

Answers (2)

Related Questions