vlade
vlade

Reputation: 173

Pandas - return multiple values from columnar apply

I have a column of strings/Nones that I'd like to convert to date. But, if there's an element that I can't convert to date (in my format), I'd like None there, but I also need to know it failed to convert(i.e. it's not None because it was there as None originally, but was incorrect entry to start with)

I could, of course, return say an error string in the conversion function, like:

import pandas as pd
import datetime

def convert(x):
    if x is None:
        return None
    try:
        d = datetime.datetime.strptime(x, '%Y%m%d')
    except:
        d = f"Error converting {x}"
    return d

s1 = pd.Series(["2021228", "2021228", "2021228", None, "x"])
s2 = s1.apply(convert)
print(s2)

0    2021-02-28 00:00:00
1    2021-02-28 00:00:00
2    2021-02-28 00:00:00
3                   None
4     Error converting x
dtype: object

But then I need to check for all the elements of the column that are error string, and, because I need them listed, would need to separate them into their own column/series.

Which TBH seems like a major pain to me, as it means I'd have to run map/apply or something similar again, so two passes for each column I need to verify.

Edit: And, in a more generic way, I don't like checking for a specific string in the first place. If it was an operation on strings, not dates, it'd be rather unreliable.

I have tried to return a Series from the apply, with labels value and error, but if there is an error (i.e. a None value in the value label) , all datetimes get converted into a float, and I can't figure out why (I looked into pandas' code, it marks it as mixed type, and I suspect does some conversion somewhere).

import pandas as pd
import datetime

def convert(x):
    v = None
    e = None
    if x is not None:
        try:
            v = datetime.datetime.strptime(x, '%Y%m%d')
        except:
            e = f"Error converting {x}"
    return pd.Series({"value":v, "error":e})

s1 = pd.Series(["2021228", "2021228", "2021228", None, "x"])
s2 = s1.apply(convert)
print(s2["value"])

0    1.614470e+18
1    1.614470e+18
2    1.614470e+18
3             NaN
4             NaN

Any ideas?

Upvotes: 0

Views: 51

Answers (1)

vlade
vlade

Reputation: 173

Actually, the easiest way seem to be to use transform(), with two functions, one which does the transformation and the other produces the error. I don't know whether it's the most efficient, but it works ok on the relatively small datasets I have.

something like

def value(x):
  return convert(x) if valid(x) else None

def error(x):
  return None if valid(x) else f"{x} is causing an error!"

s = s1.transform([value, error])

s["value"] then has the right values (or Nones), s["error"] has the errors.

Upvotes: 1

Related Questions