TMWP
TMWP

Reputation: 1625

How to do pandas equivalence of SQL outer join without a key

In SQL, you can join two tables without a key so that all records of both tables merge with each other. If pandas.concat() or pandas.merge() or some other pandas syntax supported this, it could help me with one step of a problem I am trying to solve. I found an outer join option on the help documentation, but I could not find an exact syntax to do what I wanted (join all records without a key).

To explain this a little better:

import pandas as pd

lunchmenupairs2 = [["pizza", "italian"],["lasagna", "italian"],["orange", "fruit"]]
teamcuisinepreferences2 = [["ian", "*"]]

lunchLabels = ["Food", "Type"]
teamLabels = ["Person", "Type"]

df1 = pd.DataFrame.from_records(lunchmenupairs2, columns=lunchLabels)
df2 = pd.DataFrame.from_records(teamcuisinepreferences2, columns=teamLabels)

print(df1)
print(df2)

Outputs these tables:

      Food     Type
0    pizza  italian
1  lasagna  italian
2   orange    fruit

  Person     Type
0    ian        *

I want the final result of the merge to be:

  Person     Type Food     Type
0  ian        *   pizza     italian
1  ian        *   lasagna   italian
2  ian        *   orange    fruit

Then I can easily drop the columns I don't want and move to the next step in the code I am working on. This doesn't work:

merged_data = pd.merge(left=df2,right=df1, how='outer')

Is there a way to do this type of DataFrame merging?

Upvotes: 11

Views: 9728

Answers (4)

Ran Feldesh
Ran Feldesh

Reputation: 1179

This is possible with cross-join, introduced in Pandas 1.2.0. Simply run:

df1.merge(df2, how='cross')

Upvotes: 9

citynorman
citynorman

Reputation: 5292

Building on @EFT answer, I often need some combo of values and dates, solution below. Can easily be generalized.

df1=pd.DataFrame({'ticker':['a','b']})
df2=pd.DataFrame({'date':pd.date_range('2010-01-01','2010-03-01',freq='1M')})
pd.DataFrame({'ticker':df1['ticker'].unique(),'key':np.nan}).merge(pd.DataFrame({'date':df2['date'].unique(),'key':np.nan}),on='key').drop('key',1)

  ticker       date
0      a 2010-01-31
1      a 2010-02-28
2      b 2010-01-31
3      b 2010-02-28

Upvotes: 0

EFT
EFT

Reputation: 2369

You can add a column to both dfs with a constant value,

>>>df1['joincol'] = 1
>>>df2['joincol'] = 1
>>>pd.merge(left=df2,right=df1, on='joincol', how='outer')
  Person Type_x  joincol     Food   Type_y
0    ian      *        1    pizza  italian
1    ian      *        1  lasagna  italian
2    ian      *        1   orange    fruit

then delete it afterward when you remove your other undesired columns.

Upvotes: 8

boot-scootin
boot-scootin

Reputation: 12515

I'm sure there's a better approach than the while method below, but this ought to do the trick:

df2_copy = df2.copy(deep=True)

# Basically repeat the same row in df2 until the lengths of the two dataframes are equal
while df2.shape[0] < df1.shape[0]:
    df2 = df2.append(df2_copy)

# Join on the indices of each dataframe ([0,1,2] to [0,1,2])
together = df1.join(df2, lsuffix='_df1', rsuffix='_df2').drop_duplicates()

# Fill 'ian' down
together['Person'] = together['Person'].ffill(axis=0)

# Fill type2 down
together['Type_df2'] = together['Type_df2'].ffill(axis=0)

together is now:

      Food Type_df1 Person Type_df2
0    pizza  italian    ian        *
1  lasagna  italian    ian        *
2   orange    fruit    ian        *

Upvotes: 0

Related Questions