Reputation: 693
I am trying to convert a variable name into a string a dask
environment. This code works fine in normal python environment. However, this code does not work when I run it after I create a dask
datarame. The code is below.
from dask.distributed import Client
client = Client()
import dask.dataframe as dd
import numpy as np
import pandas as pd
df = pd.DataFrame({'A':['ant','ant','cherry', 'dog', 'ant'], 'B':['animal','animal1', 'fruit', 'animal', 'animal'], 'C':['small','small1','small', 'big', np.nan]})
ddf = dd.from_pandas(df, npartitions=2)
ddf.head()
#The below code gives an error because of the above code (Please see the **error** below). The below code on its own runs fine.
my_var = [2,'wew','ewwew','44']
[ k for k,v in locals().items() if v == my_var][0]
EDIT
# The expected output. It works on jupyter notebook without any modules
# loaded. (It is the name of the list)
out []: 'my_var'
The error is below.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/env/lib/python3.5/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
159 try:
--> 160 yield
161 except Exception as e:
~/env/lib/python3.5/site-packages/dask/dataframe/core.py in elemwise(op, *args, **kwargs)
3426 with raise_on_meta_error(funcname(op)):
-> 3427 meta = partial_by_order(*parts, function=op, other=other)
3428
~/env/lib/python3.5/site-packages/dask/utils.py in partial_by_order(*args, **kwargs)
903 args2.insert(i, arg)
--> 904 return function(*args2, **kwargs)
905
~/env/lib/python3.5/site-packages/pandas/core/ops.py in f(self, other)
2090
-> 2091 other = _align_method_FRAME(self, other, axis=None)
2092
~/env/lib/python3.5/site-packages/pandas/core/ops.py in _align_method_FRAME(left, right, axis)
1984 # GH17901
-> 1985 right = to_series(right)
1986
~/env/lib/python3.5/site-packages/pandas/core/ops.py in to_series(right)
1946 raise ValueError(msg.format(req_len=len(left.columns),
-> 1947 given_len=len(right)))
1948 right = left._constructor_sliced(right, index=left.columns)
ValueError: Unable to coerce to Series, length must be 3: given 4
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-5-ce21a4e5e37e> in <module>
1 my_var = [2,'wew','ewwew','44']
----> 2 [ k for k,v in locals().items() if v == my_var][0]
<ipython-input-5-ce21a4e5e37e> in <listcomp>(.0)
1 my_var = [2,'wew','ewwew','44']
----> 2 [ k for k,v in locals().items() if v == my_var][0]
~/env/lib/python3.5/site-packages/dask/dataframe/core.py in <lambda>(self, other)
1199 return lambda self, other: elemwise(op, other, self)
1200 else:
-> 1201 return lambda self, other: elemwise(op, self, other)
1202
1203 def rolling(self, window, min_periods=None, freq=None, center=False,
~/env/lib/python3.5/site-packages/dask/dataframe/core.py in elemwise(op, *args, **kwargs)
3425 else d._meta_nonempty for d in dasks]
3426 with raise_on_meta_error(funcname(op)):
-> 3427 meta = partial_by_order(*parts, function=op, other=other)
3428
3429 result = new_dd_object(graph, _name, meta, divisions)
/usr/lib/python3.5/contextlib.py in __exit__(self, type, value, traceback)
75 value = type()
76 try:
---> 77 self.gen.throw(type, value, traceback)
78 raise RuntimeError("generator didn't stop after throw()")
79 except StopIteration as exc:
~/env/lib/python3.5/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
175 "{2}")
176 msg = msg.format(" in `{0}`".format(funcname) if funcname else "", repr(e), tb)
--> 177 raise ValueError(msg)
178
179
ValueError: Metadata inference failed in `eq`.
Original error is below:
------------------------
ValueError('Unable to coerce to Series, length must be 3: given 4',)
Traceback:
---------
File "/home/michael/env/lib/python3.5/site-packages/dask/dataframe/utils.py", line 160, in raise_on_meta_error
yield
File "/home/michael/env/lib/python3.5/site-packages/dask/dataframe/core.py", line 3427, in elemwise
meta = partial_by_order(*parts, function=op, other=other)
File "/home/michael/env/lib/python3.5/site-packages/dask/utils.py", line 904, in partial_by_order
return function(*args2, **kwargs)
File "/home/michael/env/lib/python3.5/site-packages/pandas/core/ops.py", line 2091, in f
other = _align_method_FRAME(self, other, axis=None)
File "/home/michael/env/lib/python3.5/site-packages/pandas/core/ops.py", line 1985, in _align_method_FRAME
right = to_series(right)
File "/home/michael/env/lib/python3.5/site-packages/pandas/core/ops.py", line 1947, in to_series
given_len=len(right)))
Would anyone be able to help me on this matter.
Thanks
Michael
Upvotes: 0
Views: 508
Reputation: 28683
Your code is a very odd thing to try to do! Since you iterate over all variables, you should not be surprised if what happens depends on what variables are defined. The particular case comes from asking whether 'wew' == df
, and a dataframe has a very specific understanding of what equality means. This would happen for a pandas dataframe too, or indeed anything that has a complex implementation of equals.
You probably wanted to test against strings only, since you know what you are looking for:
[k for k, v in locals().items() if isinstance(v, str) and v == my_var][0]
Upvotes: 2