Research100
Research100

Reputation: 57

Column Apply in Pandas

I am trying to multiply two pandas data_frames with varying number of columns and I would like to end up with a resulting data_frame in the shape of the first data_frame. i.e. if the row and column matches multiple the two cells, otherwise keep the value as in the first data frame. Examples below: What is the most efficient matrix form approach that will not require a for loop?

Thank you!

df1:

enter image description here

df2:

enter image description here

resulting df (df_result = df1 * df2):

enter image description here

Upvotes: 1

Views: 98

Answers (4)

piRSquared
piRSquared

Reputation: 294218

Option 1
Use pd.DataFrame.align

pd.DataFrame.mul(*df1.align(df2, 'left', fill_value=1))

             X     Y     Z
1/1/2017  0.26  0.94  0.22
1/3/2017   NaN  0.63  0.78
1/5/2017  0.73  0.79  0.25
1/6/2017  0.13   NaN  0.31

Option 2
Use pd.DataFrame.reindex

df1 * df2.reindex(df1.index, df1.columns, fill_value=1)

             X     Y     Z
1/1/2017  0.26  0.94  0.22
1/3/2017   NaN  0.63  0.78
1/5/2017  0.73  0.79  0.25
1/6/2017  0.13   NaN  0.31

Option 3
Use pd.DataFrame.mask
As recommended by commenter @CedricZoppolo:
Warning: This is making an assumption that the values of 1 are intended to mark valid positions, like a mask. This doesn't multiply the values. If the intention is to truly multiply values, then don't use this option.

df1.mask(df2.isnull().reindex_like(df1).fillna(False))

             X     Y     Z
1/1/2017  0.26  0.94  0.22
1/3/2017   NaN  0.63  0.78
1/5/2017  0.73  0.79  0.25
1/6/2017  0.13   NaN  0.31


Setup

from numpy import nan as NA

df1 = pd.DataFrame(dict(
    X=[0.26, 0.45, 0.73, 0.13],
    Y=[0.94, 0.63, 0.79, 0.16],
    Z=[0.22, 0.78, 0.25, 0.31]
), ['1/1/2017', '1/3/2017', '1/5/2017', '1/6/2017'])

df2 = pd.DataFrame(dict(
    X=[1, NA, NA, NA, 1, 1],
    XX=[NA, NA, NA, 1, 1, 1],
    Y=[1, 1, 1, 1, 1, NA],
    Y1=[NA, NA, NA, 1, NA, NA],
    YY=[NA, 1, NA, 1, NA, 1]
), ['1/1/2017', '1/2/2017', '1/3/2017', '1/4/2017', '1/5/2017', '1/6/2017'], dtype=object)

Upvotes: 4

Cedric Zoppolo
Cedric Zoppolo

Reputation: 4743

You can create a temporary Dataframe for df2 adding a column of 1s when the column of df1 is not present. Then multiply the dataframes and finally selecting the range on output using the index and columns of df1 as follows:

import numpy as np
import pandas as pd
df1 = pd.DataFrame({"Date":["1/1/2017","1/3/2017","1/5/2017","1/6/2017"],
"X":[0.26,0.45,0.73,0.13],
"Y":[0.94,0.63,0.79,0.16],
"Z":[0.22,0.78,0.25,0.31]})
df1["Date"] = pd.to_datetime(df1["Date"])
df1 = df1.set_index(["Date"])

df2 = pd.DataFrame({"Date":["1/1/2017","1/2/2017","1/3/2017","1/4/2017","1/5/2017","1/6/2017"],
"X":[1,np.nan, np.nan, np.nan, 1, 1],
"XX":[np.nan, np.nan, np.nan, 1, 1, 1],
"Y":[1, 1, 1, 1, 1, np.nan],
"Y1":[np.nan, np.nan, np.nan, 1, np.nan, np.nan],
"YY":[np.nan, 1, np.nan, 1, np.nan, 1]})
df2["Date"] = pd.to_datetime(df2["Date"])
df2 = df2.set_index(["Date"])

df2_tmp = df2.copy()
for col in df1.columns:
    if col not in df2.columns:
        df2_tmp[col] = 1
df_out = df1*df2_tmp
df_out = df_out.loc[df1.index,df1.columns]

So if your input is:

>>> df1
               X     Y     Z
Date                        
2017-01-01  0.26  0.94  0.22
2017-01-03  0.45  0.63  0.78
2017-01-05  0.73  0.79  0.25
2017-01-06  0.13  0.16  0.31
>>> df2
              X   XX    Y   Y1   YY
Date                               
2017-01-01  1.0  NaN  1.0  NaN  NaN
2017-01-02  NaN  NaN  1.0  NaN  1.0
2017-01-03  NaN  NaN  1.0  NaN  NaN
2017-01-04  NaN  1.0  1.0  1.0  1.0
2017-01-05  1.0  1.0  1.0  NaN  NaN
2017-01-06  1.0  1.0  NaN  NaN  1.0

Your output will be:

>>> df_out
               X     Y     Z
Date                        
2017-01-01  0.26  0.94  0.22
2017-01-03   NaN  0.63  0.78
2017-01-05  0.73  0.79  0.25
2017-01-06  0.13   NaN  0.31

Upvotes: 0

Alex S
Alex S

Reputation: 582

Find the shared columns, then slice both data frames using this, and then multiply:

In [47]: df1
Out[47]:
   X  Y  Z
0  1  2  3
1  4  5  6
2  7  8  9

In [48]: df2
Out[48]:
   X  XX   Y  Y1  YY
0  1   2 NaN   4 NaN
1  4   5 NaN   4   5
2  7   8   9   2   3

In [49]: shared_cols = [col for col in df1.columns if col in df2.columns]

In [50]: shared_cols
Out[50]: ['X', 'Y']

In [51]: df1[shared_cols] * df2[shared_cols]
Out[51]:
    X   Y
0   1 NaN
1  16 NaN
2  49  72

Upvotes: 0

Sandeep
Sandeep

Reputation: 119

Do not have a single line solution, but the following should work and no loops are involved.

# First DataFrame
df1 = pd.DataFrame()
df1['A'] = [1,2,3]
df1['B'] = [2,4,6]
df1['C'] = [3,6,9]

# Second DataFrame
df2 = pd.DataFrame()
df2['A'] = [1,2,3]
df2['B'] = [2,4,6]
df2['D'] = [3,6,9]

common_cols = list(set(df2.columns) & set(df1.columns))
left_only_cols = list(set(df1.columns) - set(df2.columns))

res_df = df1[common_cols]*df2[common_cols]
res_df[left_only_cols] = df1[left_only_cols]

print res_df

Upvotes: 0

Related Questions