A H
A H

Reputation: 2580

vectorise multiplication multiple dataframes based on column combinations

Say we have dataframes set as follows:

df1 = pd.DataFrame(np.random.randint(0, 2, (10, 2)), columns=['Cow', 'Sheep'])
df2 = pd.DataFrame(np.random.randint(0, 2, (10, 5)), columns=['Hungry', 'Scared', 'Happy', 'Bored', 'Sad'])
df3 = pd.DataFrame(np.random.randint(0, 2, (10, 2)), columns=['Davids', 'Michaels'])
df1.index.name = df2.index.name = df3.index.name = 'id'

combos_to_test = pd.DataFrame([('Davids', 'Cow', 'Hungry'),
                               ('Michaels', 'Cow', 'Hungry'),
                               ('Davids', 'Cow', 'Scared'),
                               ('Michaels', 'Cow', 'Scared'),
                               ('Michaels', 'Sheep', 'Scared'),
                               ('Davids', 'Sheep', 'Happy'),
                               ('Michaels', 'Sheep', 'Happy'),])

example :

   DF1:           DF2:                                               DF3:
   id Cow Sheep    id   Hungry  Scared  Happy   Bored   Sad          id    Davids   Michaels            
    0   0   1       0     0       1        1       0    1            0      1          0  
    1   0   0       1     1       0        0       1    1            1      0          1  
    2   0   0       2     1       0        0       1    1            2      0          0  
    3   1   0       3     0       0        1       0    1            3      0          1  
    4   1   0       4     0       0        1       1    0            4      0          1  
    5   1   1       5     0       0        1       1    0            5      1          0  
    6   1   1       6     1       0        1       1    0            6      1          0  
    7   1   0       7     1       1        1       1    0            7      1          1  
    8   1   1       8     1       1        1       1    0            8      1          0  
    9   1   0       9     0       1        1       0    0            9      1          0    

And I need a 4th dataframe, which finds (for each combination), when each combos_to_test is a column.

The way I plan on doing this is to change the columns into:

df1.columns = Cow, Cow, Cow, Cow, Sheep, Sheep, Sheep
df2.columns = Hungry, Hungry, Scared, Scared, Happy, Happy
df3.columns = David, Michael, David, Michael, Michael, David, Michael

Then renaming all the cols to col1, col2, col3, ..., col8

and then multiplying each dataframe by eachother (which will vectorise it - but take large amounts of memory).

My dataset is obviously much bigger, and will be using numpy/pandas.

The output df should look like:

  ('Davids', 'Cow', 'Hungry') | ('Michaels', 'Cow', 'Hungry') | ('Davids', 'Cow', 'Scared') | ('Michaels', 'Cow', 'Scared') | ...
 1)         0                               1                             0                                 0
 2)         0                               0                             0                                 0
 3)         0                               1                             0                                 0
 4)         0                               0                             1                                 0
 5)         0                               0                             0                                 0
 6)         0                               0                             0                                 0
 7)         0                               0                             0                                 0
 8)         0                               0                             1                                 1
 9)         1                               0                             0                                 0
10)         1                               0                             0                                 0

Upvotes: 1

Views: 75

Answers (2)

piRSquared
piRSquared

Reputation: 294508

I can do this with pd.concat

df = pd.concat([df1, df2, df3], axis=1)

pd.concat({
    ctt: df.reindex(columns=ctt).prod(1)
    for ctt in map(tuple, combos_to_test.values)
}, axis=1)

   Davids              Michaels                    
      Cow        Sheep      Cow        Sheep       
   Hungry Scared Happy   Hungry Scared Happy Scared
id                                                 
0       0      0     0        0      0     0      0
1       1      1     0        1      1     0      1
2       0      0     0        0      0     0      0
3       0      0     0        0      0     1      0
4       1      1     0        0      0     0      0
5       0      0     0        1      1     1      1
6       0      0     0        0      0     0      0
7       0      0     0        0      0     0      0
8       0      0     0        0      0     0      0
9       0      0     0        0      0     0      0

Upvotes: 3

Caio Sampaio
Caio Sampaio

Reputation: 49

The easiest way to copy a column is to use just:

df1['Cow_copy'] = df1['Cow']

If you want to copy many columns, you could make a List of columns and loop through it and using the code above for each one.

Upvotes: 0

Related Questions