Jamie Lee
Jamie Lee

Reputation: 83

creating a dataframe with a nested loop

I’m new to pandas so forgive me if this is an easy question. I would like to create a dataframe by using elements from two different dataframes. What is the best way to do this? It looks like I will have to use some kind of nested for loop:

for a in df1.itertuples()
    for b in df2.itertuples()
        if df1.vega(a) = df2.vega(b)
            delta = df2.gamma(a) – df1.gamma(b)
                if delta > 0 or delta <0
                    zelda = df2.blabla(b)

EDIT: both dataframes have the same column names:

df1 = pd.DataFrame({'vega': [bla1, bla2, bla3, bla4], 'gamma': [242340.1, 466530.2, 325346, 34654365]})
df2 = pd.DataFrame({'vega': [bla6, bla2, bla7, bla4], 'gamma': [3454334, 896530.2, 32543, 34654]})

EDIT2: In this example the new dataframe would be:

 zelda|
 ----- |
430000 |
-34619711 |

EDIT3: Sorry guys, your answers opened my eyes and made me rethink my problem as it seems no for loop at all is needed here. I added a ‘ranking’ column to my dataframes, shuffled the ‘vega’ column in df2 and added the ‘rho’ column to the resulting dataframe. (I actually have more columns in the dataframes, but they are all used in the same way as either ‘gamma’, ‘blabla’, or ‘ranking’ columns.)

Is this still possible using the merge solutions you guys came up with? Again, my apologies for the first incomplete question.

    zelda = df2.blabla(b)
    rho = df1.ranking(a)

df1 = pd.DataFrame({'vega': [bla1, bla2, bla3, bla4], 'gamma': [242340.1, 466530.2, 325346, 34654365, ‘blabla’: [regina, cactus, galileo, viking], ‘ranking’: [11, 34,65,46]]})
df2 = pd.DataFrame({'vega': [bla2, bla6, bla7, bla4], 'gamma': [3454334, 896530.2, 32543, 34654], ‘blabla’: [lucy, fletnix, ingrid, pablo], ‘ranking’: [45, 4,5,3]})

In this new example, the new dataframe would be:

    delta|  zelda| rho|vega
---------|-------|----|----
2987803,8|fletnix|  34|bla2
-34619711|  pablo|  46|bla4

Upvotes: 1

Views: 837

Answers (4)

xgord
xgord

Reputation: 4776

Another option, create a dataframe merged on the vega column and add columns to it for your new data. Setting how='inner' in the merge call keeps only the rows where that vega value appeared in both df1 and df2.

joint = df1.merge(df2, on='vega', how='inner')

      gamma_x  vega   gamma_y
0    466530.2  bla2  896530.2
1  34654365.0  bla4   34654.0


Then pandas lets you easily subtract the values in one column from another:

joint['delta'] = joint['gamma_y'] - joint['gamma_x']

      gamma_x  vega   gamma_y       delta
0    466530.2  bla2  896530.2    430000.0
1  34654365.0  bla4   34654.0 -34619711.0


And you can select the rows that meet your criteria:

zelda = joint[(joint['delta'] > 0) | (joint['delta'] < 0)]['delta']


Update: response to your edit 3

Yes, you can do what you want. Just try it out for yourself. When you merge the two dataframes together, the non-matching columns from the df1 will become colname_x and the non-matching columns from df2 will become colname_y. So you can select the columns you want based on that.

joint = df1.merge(df2, on='vega', how='inner')
joint['delta'] = joint['gamma_y'] - joint['gamma_x']
joint['rho'] = joint['ranking_x'] # select the ranking column from df1
joint['zelda'] = joint['blabla_y'] # select the blabla column from df2
joint[['delta', 'zelda', 'rho', 'vega']] # select only the columns from your example output

        delta  zelda  rho  vega
0   2987803.8   lucy   34  bla2
1 -34619711.0  pablo   46  bla4

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153460

Or you can use set_index and subract dataframes using intrinsic index alignment nature of Pandas.

df1.set_index('vega')
df2.set_index('vega')
(df2 - df1).dropna().rename(columns={'gamma':'zelda'})

output:

          zelda
vega            
bla2    430000.0
bla4 -34619711.0

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210822

IIUC:

In [56]: df1.merge(df2, on='vega', suffixes=['1','2']) \
    ...:    .eval("zelda = gamma2 - gamma1", inplace=False)[['zelda']]
    ...:
Out[56]:
        zelda
0    430000.0
1 -34619711.0

Upvotes: 2

Kyle
Kyle

Reputation: 2894

Is this what you are looking for?

mask = df1.vega == df2.vega
delta = df1[mask].gamma - df2[mask].gamma

delta[delta != 0]
1     -430000.0
3    34619711.0
Name: gamma, dtype: float64

Upvotes: 1

Related Questions