Reputation: 83
I’m new to pandas so forgive me if this is an easy question. I would like to create a dataframe by using elements from two different dataframes. What is the best way to do this? It looks like I will have to use some kind of nested for loop:
for a in df1.itertuples()
for b in df2.itertuples()
if df1.vega(a) = df2.vega(b)
delta = df2.gamma(a) – df1.gamma(b)
if delta > 0 or delta <0
zelda = df2.blabla(b)
EDIT: both dataframes have the same column names:
df1 = pd.DataFrame({'vega': [bla1, bla2, bla3, bla4], 'gamma': [242340.1, 466530.2, 325346, 34654365]})
df2 = pd.DataFrame({'vega': [bla6, bla2, bla7, bla4], 'gamma': [3454334, 896530.2, 32543, 34654]})
EDIT2: In this example the new dataframe would be:
zelda|
----- |
430000 |
-34619711 |
EDIT3: Sorry guys, your answers opened my eyes and made me rethink my problem as it seems no for loop at all is needed here. I added a ‘ranking’ column to my dataframes, shuffled the ‘vega’ column in df2 and added the ‘rho’ column to the resulting dataframe. (I actually have more columns in the dataframes, but they are all used in the same way as either ‘gamma’, ‘blabla’, or ‘ranking’ columns.)
Is this still possible using the merge solutions you guys came up with? Again, my apologies for the first incomplete question.
zelda = df2.blabla(b)
rho = df1.ranking(a)
df1 = pd.DataFrame({'vega': [bla1, bla2, bla3, bla4], 'gamma': [242340.1, 466530.2, 325346, 34654365, ‘blabla’: [regina, cactus, galileo, viking], ‘ranking’: [11, 34,65,46]]})
df2 = pd.DataFrame({'vega': [bla2, bla6, bla7, bla4], 'gamma': [3454334, 896530.2, 32543, 34654], ‘blabla’: [lucy, fletnix, ingrid, pablo], ‘ranking’: [45, 4,5,3]})
In this new example, the new dataframe would be:
delta| zelda| rho|vega
---------|-------|----|----
2987803,8|fletnix| 34|bla2
-34619711| pablo| 46|bla4
Upvotes: 1
Views: 837
Reputation: 4776
Another option, create a dataframe merged on the vega
column and add columns to it for your new data. Setting how='inner'
in the merge call keeps only the rows where that vega
value appeared in both df1
and df2
.
joint = df1.merge(df2, on='vega', how='inner')
gamma_x vega gamma_y
0 466530.2 bla2 896530.2
1 34654365.0 bla4 34654.0
Then pandas lets you easily subtract the values in one column from another:
joint['delta'] = joint['gamma_y'] - joint['gamma_x']
gamma_x vega gamma_y delta
0 466530.2 bla2 896530.2 430000.0
1 34654365.0 bla4 34654.0 -34619711.0
And you can select the rows that meet your criteria:
zelda = joint[(joint['delta'] > 0) | (joint['delta'] < 0)]['delta']
Yes, you can do what you want. Just try it out for yourself. When you merge the two dataframes together, the non-matching columns from the df1
will become colname_x
and the non-matching columns from df2
will become colname_y
. So you can select the columns you want based on that.
joint = df1.merge(df2, on='vega', how='inner')
joint['delta'] = joint['gamma_y'] - joint['gamma_x']
joint['rho'] = joint['ranking_x'] # select the ranking column from df1
joint['zelda'] = joint['blabla_y'] # select the blabla column from df2
joint[['delta', 'zelda', 'rho', 'vega']] # select only the columns from your example output
delta zelda rho vega
0 2987803.8 lucy 34 bla2
1 -34619711.0 pablo 46 bla4
Upvotes: 0
Reputation: 153460
Or you can use set_index
and subract dataframes using intrinsic index alignment nature of Pandas.
df1.set_index('vega')
df2.set_index('vega')
(df2 - df1).dropna().rename(columns={'gamma':'zelda'})
output:
zelda
vega
bla2 430000.0
bla4 -34619711.0
Upvotes: 1
Reputation: 210822
IIUC:
In [56]: df1.merge(df2, on='vega', suffixes=['1','2']) \
...: .eval("zelda = gamma2 - gamma1", inplace=False)[['zelda']]
...:
Out[56]:
zelda
0 430000.0
1 -34619711.0
Upvotes: 2
Reputation: 2894
Is this what you are looking for?
mask = df1.vega == df2.vega
delta = df1[mask].gamma - df2[mask].gamma
delta[delta != 0]
1 -430000.0
3 34619711.0
Name: gamma, dtype: float64
Upvotes: 1