Variable to string conversion in dask environment

I am trying to convert a variable name into a string a dask environment. This code works fine in normal python environment. However, this code does not work when I run it after I create a dask datarame. The code is below.

from dask.distributed import Client
client = Client()
import dask.dataframe as dd

import numpy as np
import pandas as pd

df = pd.DataFrame({'A':['ant','ant','cherry', 'dog', 'ant'], 'B':['animal','animal1', 'fruit', 'animal', 'animal'], 'C':['small','small1','small', 'big', np.nan]})

ddf = dd.from_pandas(df, npartitions=2)

ddf.head()

#The below code gives an error because of the above code (Please see the **error** below). The below code on its own runs fine.

my_var = [2,'wew','ewwew','44']
[ k for k,v in locals().items() if v == my_var][0]

EDIT

# The expected output. It works on jupyter notebook without any modules 
# loaded. (It is the name of the list)
out []: 'my_var' 

The error is below.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/env/lib/python3.5/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
    159     try:
--> 160         yield
    161     except Exception as e:

~/env/lib/python3.5/site-packages/dask/dataframe/core.py in elemwise(op, *args, **kwargs)
   3426         with raise_on_meta_error(funcname(op)):
-> 3427             meta = partial_by_order(*parts, function=op, other=other)
   3428 

~/env/lib/python3.5/site-packages/dask/utils.py in partial_by_order(*args, **kwargs)
    903         args2.insert(i, arg)
--> 904     return function(*args2, **kwargs)
    905 

~/env/lib/python3.5/site-packages/pandas/core/ops.py in f(self, other)
   2090 
-> 2091         other = _align_method_FRAME(self, other, axis=None)
   2092 

~/env/lib/python3.5/site-packages/pandas/core/ops.py in _align_method_FRAME(left, right, axis)
   1984         # GH17901
-> 1985         right = to_series(right)
   1986 

~/env/lib/python3.5/site-packages/pandas/core/ops.py in to_series(right)
   1946                 raise ValueError(msg.format(req_len=len(left.columns),
-> 1947                                             given_len=len(right)))
   1948             right = left._constructor_sliced(right, index=left.columns)

ValueError: Unable to coerce to Series, length must be 3: given 4

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-5-ce21a4e5e37e> in <module>
      1 my_var = [2,'wew','ewwew','44']
----> 2 [ k for k,v in locals().items() if v == my_var][0]

<ipython-input-5-ce21a4e5e37e> in <listcomp>(.0)
      1 my_var = [2,'wew','ewwew','44']
----> 2 [ k for k,v in locals().items() if v == my_var][0]

~/env/lib/python3.5/site-packages/dask/dataframe/core.py in <lambda>(self, other)
   1199             return lambda self, other: elemwise(op, other, self)
   1200         else:
-> 1201             return lambda self, other: elemwise(op, self, other)
   1202 
   1203     def rolling(self, window, min_periods=None, freq=None, center=False,

~/env/lib/python3.5/site-packages/dask/dataframe/core.py in elemwise(op, *args, **kwargs)
   3425                  else d._meta_nonempty for d in dasks]
   3426         with raise_on_meta_error(funcname(op)):
-> 3427             meta = partial_by_order(*parts, function=op, other=other)
   3428 
   3429     result = new_dd_object(graph, _name, meta, divisions)

/usr/lib/python3.5/contextlib.py in __exit__(self, type, value, traceback)
     75                 value = type()
     76             try:
---> 77                 self.gen.throw(type, value, traceback)
     78                 raise RuntimeError("generator didn't stop after throw()")
     79             except StopIteration as exc:

~/env/lib/python3.5/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
    175                 "{2}")
    176         msg = msg.format(" in `{0}`".format(funcname) if funcname else "", repr(e), tb)
--> 177         raise ValueError(msg)
    178 
    179 

ValueError: Metadata inference failed in `eq`.

Original error is below:
------------------------
ValueError('Unable to coerce to Series, length must be 3: given 4',)

Traceback:
---------
  File "/home/michael/env/lib/python3.5/site-packages/dask/dataframe/utils.py", line 160, in raise_on_meta_error
    yield
  File "/home/michael/env/lib/python3.5/site-packages/dask/dataframe/core.py", line 3427, in elemwise
    meta = partial_by_order(*parts, function=op, other=other)
  File "/home/michael/env/lib/python3.5/site-packages/dask/utils.py", line 904, in partial_by_order
    return function(*args2, **kwargs)
  File "/home/michael/env/lib/python3.5/site-packages/pandas/core/ops.py", line 2091, in f
    other = _align_method_FRAME(self, other, axis=None)
  File "/home/michael/env/lib/python3.5/site-packages/pandas/core/ops.py", line 1985, in _align_method_FRAME
    right = to_series(right)
  File "/home/michael/env/lib/python3.5/site-packages/pandas/core/ops.py", line 1947, in to_series
    given_len=len(right)))

Would anyone be able to help me on this matter.

Thanks

Michael

Upvotes: 0

Views: 508

Answers (1)

mdurant
mdurant

Reputation: 28683

Your code is a very odd thing to try to do! Since you iterate over all variables, you should not be surprised if what happens depends on what variables are defined. The particular case comes from asking whether 'wew' == df, and a dataframe has a very specific understanding of what equality means. This would happen for a pandas dataframe too, or indeed anything that has a complex implementation of equals.

You probably wanted to test against strings only, since you know what you are looking for:

[k for k, v in locals().items() if isinstance(v, str) and v == my_var][0]

Upvotes: 2

Related Questions