Pythonnorra
Pythonnorra

Reputation: 29

Multiply column in dataframe with one row in another dataframe

I'm having problems with multiplying values in two different dataframes. Im doing a PCA regression and want to multiply all my loadings with the original values.

for example:

PCA dataframe

PC1 PC2
X 0 1
X1 1 2
X2 2 1
X3 2 1
X4 3 2
X5 5 4

Original dataframe:

A A1 A2 A3 A4 A5
1 1 3 4 1 2 4
2 8 5 3 2 1 2
3 9 3 5 1 3 1

I then want to multiply PC1 with every row in the original dataframe such that:

PC1 = 0xA + 1xA1 + 2xA2 + 2xA3 + 3xA4 + 5xA5

insert first row from second dataframe: PC1 = 0x1 + 3x1 + 4x2 + 2x1 + 3x2 + 5x8 = 59 Second row: PC1 = 0x8 + 5x1 +3x2 + 2x2 + 1x3 + 5x2 = 28 Third row: PC1 = 0x9 + 1x3 + 2x5 + 2x1 + 3x3 + 1x5 = 29

new dataframe:

PC1 PC2
1 59
2 28
3 29

And so on.

My PCA dataframe have the shape (14,4) and my value dataframe has the shape (159,14)

Upvotes: 0

Views: 1155

Answers (3)

keramat
keramat

Reputation: 4543

Use:

string = """    PC1 PC2
X   0   1
X1  1   2
X2  2   1
X3  2   1
X4  3   2
X5  5   4"""
string2 = """A  A1  A2  A3  A4  A5
1   3   4   1   2   4
8   5   3   2   1   2
9   3   5   1   3   1"""
data1 = [x.split('  ') for x in string.split('\n')]
data2 = [x.split('  ') for x in string2.split('\n')]

df1 = pd.DataFrame(np.array([x[1:] for x in data1[1:]], dtype = float), columns = np.array(data1)[0,1:])
df2 = pd.DataFrame(np.array(data2[1:], dtype = float), columns = data2[0])





#Solution
import numpy as np
pd.DataFrame(np.dot(df2,df1), columns = ['PC1', 'PC2'])

Output:

enter image description here

Upvotes: 2

jezrael
jezrael

Reputation: 862511

If same length of first DataFrame and same length of columns names in second DataFrame is possible multiple by numpy array with DataFrame.dot with rename columns names by df1.columns:

df = df2.dot(df1.to_numpy()).rename(columns=dict(enumerate(df1.columns)))
print (df)
   PC1  PC2
1   39   32
2   28   33
3   29   31

Upvotes: 2

Mortz
Mortz

Reputation: 4879

You are looking for a dot product - which you can get with np.dot

print(df)
    2  3
1       
X   0  1
X1  1  2
X2  2  1
X3  2  1
X4  3  2
X5  5  4
print(xf)
   2  3  4  5  6  7
1                  
1  1  3  4  1  2  4
2  8  5  3  2  1  2
3  9  3  5  1  3  1
print(pd.DataFrame(np.dot(xf, df), columns=['PC1', 'PC2']))
   PC1  PC2
0   39   32
1   28   33
2   29   31

Upvotes: 2

Related Questions