J.Do
J.Do

Reputation: 428

None values after handle ValueError in a request?

I am making some requests to an API which postag the following text as follows:

def pos(text):
    payload = {'key': 'thekey', 'of': 'json', 'ilang': 'ES', \
               'txt': text, \
               'tt': 'a', \
               'uw': 'y', 'lang': 'es'}

    r = requests.get('http://api.meaningcloud.com/parser-2.0', params=payload, stream = True)
    return r.json()

At the beginning, it gave me a ValueError:

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-19-ac09c6405340> in <module>()
      1 
----> 2 df['tags'] = df['tweets'].apply(transform)
      3 df

/usr/local/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   2292             else:
   2293                 values = self.asobject
-> 2294                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2295 
   2296         if len(mapped) and isinstance(mapped[0], Series):

pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:66124)()

<ipython-input-18-707ac7b399b4> in transform(a_lis)
     25 
     26 def transform(a_lis):
---> 27     analysis = pos(str(a_lis))
     28     a_list = parse_tree(analysis['token_list'], [])
     29     return a_list

<ipython-input-18-707ac7b399b4> in pos(text)
      8 
      9     r = requests.get('http://api.meaningcloud.com/parser-2.0', params=payload, stream = True)
---> 10     return r.json()
     11 
     12 def parse_tree(token, a_list):

/usr/local/lib/python3.5/site-packages/requests/models.py in json(self, **kwargs)
    864                     # used.
    865                     pass
--> 866         return complexjson.loads(self.text, **kwargs)
    867 
    868     @property

/usr/local/lib/python3.5/site-packages/simplejson/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, **kw)
    514             parse_constant is None and object_pairs_hook is None
    515             and not use_decimal and not kw):
--> 516         return _default_decoder.decode(s)
    517     if cls is None:
    518         cls = JSONDecoder

/usr/local/lib/python3.5/site-packages/simplejson/decoder.py in decode(self, s, _w, _PY3)
    368         if _PY3 and isinstance(s, binary_type):
    369             s = s.decode(self.encoding)
--> 370         obj, end = self.raw_decode(s)
    371         end = _w(s, end).end()
    372         if end != len(s):

/usr/local/lib/python3.5/site-packages/simplejson/decoder.py in raw_decode(self, s, idx, _w, _PY3)
    398             elif ord0 == 0xef and s[idx:idx + 3] == '\xef\xbb\xbf':
    399                 idx += 3
--> 400         return self.scan_once(s, idx=_w(s, idx).end())

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Therefore I handled the exception and applied it to a pandas dataframe column with:

df = pd.read_csv('../data.csv')
df['tagged_text'] = df['tweets'].apply(transform)

However, with some instances (columns) I got None:

text                                                tagged_text
Siento que estoy en un cuarto oscuro y hay sil...   [(sentar, VI-S1PSABL-N4), (que, CSSN9), (estar...
Los mejores de @UEoficial Sebastián Jaime, Sey...    None
#ColoColoJuegaEnEl13 la primera y adentro mier...    None
Juguito heladoooo de melón: me siento se...          None
@sxfiacrespo @lunasoledadhern Hola Luna...          [(@sxfiacrespo @lunasoledadhern, NPUU-N-), (ho...

Thus, my question is why at some texts (columns) I am getting None and how can I correctly tag those None instances?. Note that I made some tests and there is no problem with the text, since for those None a json with all the tagged content is returned. For example consider this function application.

Upvotes: 1

Views: 241

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121186

This does next to nothing:

except ValueError:
    np.nan

That only references the np.nan object. If you want to return it, you need to do so explicitly:

except ValueError:
    return np.nan

otherwise the function just.. ends, which means None is returned.

Other notes:

r = requests.get('http://api.meaningcloud.com/parser-2.0', data=payload, stream = True)
json_data = json.dumps(r.json())
data = yaml.load(json_data)
return data

is a really expensive way of spelling

r = requests.post('http://api.meaningcloud.com/parser-2.0', data=payload)
return r.json()

Loading JSON into Python, then producing JSON again, then using a YAML parser to turn the JSON back to Python is somewhat excessive. I also removed the stream=True; that's only needed when you want to process the response data as a stream (which the response.json() method doesn't do).

According to the API documentation, txt is supposed to be a single string.I'd not use str(a_lis) to produce that; if you have a list of strings, just join those into one long string with ' '.join(a_lis). However, I'm sure that pandas.Series.apply() passes in individual values (e.g. strings) to your function, at which point there is no need to join anything at all (but your a_lis variable name is very confusing in that case).

The API also specifies that it uses POST requests (I'm surprised they accept GET still anyway). Using a POST request (requests.post()) will allow you to send much larger pieces of text for analysis. Use the data keyword. I've used the correct syntax in my last sample above.

That you used GET is also the reason you get a ValueError:

>>> r  = requests.get('http://api.meaningcloud.com/parser-2.0', params=payload)
>>> r.status_code
414
>>> r.reason
'Request-URI Too Long'
>>> r = requests.post('http://api.meaningcloud.com/parser-2.0', data=payload)
>>> r.status_code
200

Upvotes: 2

Related Questions