Amanda
Amanda

Reputation: 885

Multiply dataframes with differnet lengths regarding columns names

I have two dataframes, the first one df1 contains only one row :

   A  B  C  D  E
0  5  8  9  5  0

and the second one has multiple rows , but the same number of columns:

   D  C  E  A  B
0  5  0  3  3  7
1  9  3  5  2  4
2  7  6  8  8  1
3  6  7  7  8  1
4  5  9  8  9  4
5  3  0  3  5  0
6  2  3  8  1  3
7  3  3  7  0  1
8  9  9  0  4  7
9  3  2  7  2  0

In real example I have much more columns (more than 100). the both dataframes has the same number of columns, and the same columns names, but the order of columns is different, as it's shown in the example. I should multiply two dataframes (matrix_like multiplication), except of I couldn't perform simple df2.values * df1.values because the columns are not ordered in the same manner, so for instance the second column of df1 B couldn't be multiplied at the second column of df2, because we find C instead of B at second column of df2 , while the column B is the 5th column in df2.

Is there simple and pythonic solution to multiply the dataframes, taking into account the column names ant not column index?

Upvotes: 1

Views: 1482

Answers (2)

jezrael
jezrael

Reputation: 862641

You can use mul, df1 is converted to Serie by ix:

print df1.ix[0]
A    5
B    8
C    9
D    5
E    0
Name: 0, dtype: int64

print df2.mul(df1.ix[0])
    A   B   C   D  E
0  15  56   0  25  0
1  10  32  27  45  0
2  40   8  54  35  0
3  40   8  63  30  0
4  45  32  81  25  0
5  25   0   0  15  0
6   5  24  27  10  0
7   0   8  27  15  0
8  20  56  81  45  0
9  10   0  18  15  0

If you need change order of final DataFrame, use with reindex_axis:

print df2.mul(df1.ix[0]).reindex_axis(df2.columns.tolist(), axis=1)
    D   C  E   A   B
0  25   0  0  15  56
1  45  27  0  10  32
2  35  54  0  40   8
3  30  63  0  40   8
4  25  81  0  45  32
5  15   0  0  25   0
6  10  27  0   5  24
7  15  27  0   0   8
8  45  81  0  20  56
9  15  18  0  10   0

Another solution is reorder columns by reindex index of Serie by df2.columns:

print df2.mul(df1.ix[0].reindex(df2.columns))
    D   C  E   A   B
0  25   0  0  15  56
1  45  27  0  10  32
2  35  54  0  40   8
3  30  63  0  40   8
4  25  81  0  45  32
5  15   0  0  25   0
6  10  27  0   5  24
7  15  27  0   0   8
8  45  81  0  20  56
9  15  18  0  10   0

Upvotes: 2

user2285236
user2285236

Reputation:

df1[df2.columns] returns a dataframe where the columns are ordered as in df2:

df1
Out[91]: 
   A  B  C  D  E
0  3  8  9  5  0

df1[df2.columns]
Out[92]: 
   D  C  E  A  B
0  5  9  0  3  8

So, you just need:

df2.values * df1[df2.columns].values

This will raise a key error if you have additional columns in df2; and it will only select df2's columns even if you have more columns in df1.

As @MaxU noted, since you are operating on numpy arrays, in order to go back to the dataframe structure you will need:

pd.DataFrame(df2.values * df1[df2.columns].values, columns = df2.columns)

Upvotes: 4

Related Questions