Reputation: 1376
I have two data frames as shown below df1 and df2. I want to create a third dataframe i.e. df as shown below. What would be the appropriate way?
df1={'id':['a','b','c'],
'val':[1,2,3]}
df1=pd.DataFrame(df)
df1
id val
0 a 1
1 b 2
2 c 3
df2={'yr':['2010','2011','2012'],
'val':[4,5,6]}
df2=pd.DataFrame(df2)
df2
yr val
0 2010 4
1 2011 5
2 2012 6
df={'id':['a','b','c'],
'val':[1,2,3],
'2010':[4,8,12],
'2011':[5,10,15],
'2012':[6,12,18]}
df=pd.DataFrame(df)
df
id val 2010 2011 2012
0 a 1 4 5 6
1 b 2 8 10 12
2 c 3 12 15 18
I can basically convert df1 and df2 as 1 by n matrices and get n by n result and assign it back to the df1. But is there any easy pandas way?
Upvotes: 0
Views: 228
Reputation: 7230
We can do it in one line like this:
df1.join(df1.val.apply(lambda x: x * df2.set_index('yr').val))
or like this:
df1.join(df1.set_index('id') @ df2.set_index('yr').T, on='id')
Done.
Let's see what's going on here.
To find the output of multiplication of each df1.val
by values in df2.val
we use apply
:
df1['val'].apply(lambda x: x * df2.val)
The function inside will obtain df1.val
s one by one and multiply each by df2.val
element-wise (see broadcasting for details if needed). As far as df2.val
is a pandas sequence, the output is a data frame with indexes df1.val.index
and columns df2.val.index
. By df2.set_index('yr')
we force years to be indexes before multiplication so they will become column names in the output.
DataFrame.join
is joining frames index-on-index by default. So due to identical indexes of df1
and the multiplication output, we can apply df1.join( <the output of multiplication> )
as is.
At the end we get the desired matrix with indexes df1.index
and columns id, val, *df2['yr']
.
The second variant with @
operator is actually the same. The main difference is that we multiply 2-dimentional frames instead of series. These are the vertical and horizontal vectors, respectively. So the matrix multiplication will produce a frame with indexes df1.id
and columns df2.yr
and element-wise multiplication as values. At the end we connect df1
with the output on identical id
column and index respectively.
Upvotes: 1
Reputation: 73
Your question is a bit vague. But I suppose you want to do something like that:
df = pd.concat([df1, df2], axis=1)
Upvotes: 0
Reputation: 998
This works for me:
df2 = df2.T
new_df = pd.DataFrame(np.outer(df1['val'],df2.iloc[1:]))
df = pd.concat([df1, new_df], axis=1)
df.columns = ['id', 'val', '2010', '2011', '2012']
df
The output I get:
id val 2010 2011 2012
0 a 1 4 5 6
1 b 2 8 10 12
2 c 3 12 15 18
Upvotes: 0