Reputation: 2332
I am trying to join two data frames that were all imported by vaex. I think this should be simple but I am having challenges with the vaex expressions
. Here's what I did:
vx_neighbors.join(vx_neighbours_df, on=['Neighbour', 'Year', 'day'])
and I got the error:
c:\python-3.8.2\lib\site-packages\vaex\join.py in join(df, other, on, left_on, right_on, lprefix, rprefix, lsuffix, rsuffix, how, allow_duplication, prime_growth, cardinality_other, inplace)
145 left = left if inplace else left.copy()
146
--> 147 on = _ensure_string_from_expression(on)
148 left_on = _ensure_string_from_expression(left_on)
149 right_on = _ensure_string_from_expression(right_on)
c:\python-3.8.2\lib\site-packages\vaex\utils.py in _ensure_string_from_expression(expression)
770 return expression.expression
771 else:
--> 772 raise ValueError('%r is not of string or Expression type, but %r' % (expression, type(expression)))
773
774
ValueError: ['Neighbour', 'Year', 'day'] is not of string or Expression type, but <class 'list'>
How can I convert the list to a vaex expression?
Upvotes: 0
Views: 1655
Reputation: 101
As @Joco mentioned, you cannot officially join on many columns in Vaex. However, you can unofficially join by creating a join-key column.
import vaex
df1 = vaex.from_arrays(
Neighbor=["bob", "alice", "jared"],
Year=[2020, 2021, 2020],
day=["M", "T", "T"]
)
df2 = vaex.from_arrays(
Neighbor=["bob", "alice", "jared"],
Year=[2021, 2021, 2020],
day=["M", "T", "W"]
)
for d in [df1, df2]:
d["join_key"] = d["Neighbor"] + "-" + d["Year"].astype(str) + "-" + d["day"]
joined = df1.join(df2, on="join_key", rsuffix="_R")
display(joined)
Notice that I'm casting the non-string columns to string, that's required.
Upvotes: 1