Kay
Kay

Reputation: 2332

Joining two dataframes using vaex

I am trying to join two data frames that were all imported by vaex. I think this should be simple but I am having challenges with the vaex expressions. Here's what I did:

vx_neighbors.join(vx_neighbours_df, on=['Neighbour', 'Year', 'day'])

and I got the error:

c:\python-3.8.2\lib\site-packages\vaex\join.py in join(df, other, on, left_on, right_on, lprefix, rprefix, lsuffix, rsuffix, how, allow_duplication, prime_growth, cardinality_other, inplace)
    145     left = left if inplace else left.copy()
    146 
--> 147     on = _ensure_string_from_expression(on)
    148     left_on = _ensure_string_from_expression(left_on)
    149     right_on = _ensure_string_from_expression(right_on)

c:\python-3.8.2\lib\site-packages\vaex\utils.py in _ensure_string_from_expression(expression)
    770         return expression.expression
    771     else:
--> 772         raise ValueError('%r is not of string or Expression type, but %r' % (expression, type(expression)))
    773 
    774 

ValueError: ['Neighbour', 'Year', 'day'] is not of string or Expression type, but <class 'list'>

How can I convert the list to a vaex expression?

Upvotes: 0

Views: 1655

Answers (1)

Ben Epstein
Ben Epstein

Reputation: 101

As @Joco mentioned, you cannot officially join on many columns in Vaex. However, you can unofficially join by creating a join-key column.

import vaex

df1 = vaex.from_arrays(
    Neighbor=["bob", "alice", "jared"],
    Year=[2020, 2021, 2020],
    day=["M", "T", "T"]
)

df2 = vaex.from_arrays(
    Neighbor=["bob", "alice", "jared"],
    Year=[2021, 2021, 2020],
    day=["M", "T", "W"]
)

for d in [df1, df2]:
    d["join_key"] = d["Neighbor"] + "-" + d["Year"].astype(str) + "-" + d["day"]

joined = df1.join(df2, on="join_key", rsuffix="_R")
display(joined)

Notice that I'm casting the non-string columns to string, that's required.

Upvotes: 1

Related Questions