Why is Pandas doing this?

Question

I am applying a function on column 'Outcome Date' that will change it's date format.

def change_date_format(row):

    old_date = row['Outcome Date']
    old_date_reformatted  = datetime.datetime.strptime(old_date, '%m/%d/%Y %H:%M').strftime('%Y-%m-%d %H:%M')
    row['Outcome Date'] = old_date_reformatted

    return row


ffn = os.path.join(new_ciq_root, filename)

in_df = pd.read_csv(ffn, encoding="ISO-8859-1")
in_df[col_name] = in_df.apply(lambda row: change_date_format(row), axis=1)

I put a breakpoint in apply function, it gets to last line, that row's 'Outcome Date' seems to be reformatted correctly (screenshot below)

But the end result is not a DF with correctly reformatted 'Outcome Date' col, but rather where 'Outcome Date' replaced by values from 'Outcome Type' col. What am I doing wrong??

HINT? My debugger is hitting the following exception in C:\Users\aidenm\AppData\Local\Programs\Python\Python37-32\Lib\site-packages\pandas\core\apply.py after every iteration

    def apply_standard(self):

        # try to reduce first (by default)
        # this only matters if the reduction in values is of different dtype
        # e.g. if we want to apply to a SparseFrame, then can't directly reduce

        # we cannot reduce using non-numpy dtypes,
        # as demonstrated in gh-12244
        if (self.result_type in ['reduce', None] and
                not self.dtypes.apply(is_extension_type).any()):

            # Create a dummy Series from an empty array
            from pandas import Series
            values = self.values
            index = self.obj._get_axis(self.axis)
            labels = self.agg_axis
            empty_arr = np.empty(len(index), dtype=values.dtype)
            dummy = Series(empty_arr, index=index, dtype=values.dtype)

            try:
                result = reduction.reduce(values, self.f,
                                          axis=self.axis,
                                          dummy=dummy,
                                          labels=labels)
                return self.obj._constructor_sliced(result, index=labels)
            except Exception:
                pass

Lior Cohen · Accepted Answer

After the apply you are getting a new df but you assign it to the old in_df[col_name].

You should df = df.apply(...)

lambda row: change_date_format(row) is the same as passing change_date_format the lambda is redundant.
In your case, it is much better and elegant to apply the function on the single column Series and not on the row:

in_df[col_name] = in_df[col_name].apply(change_date_format1(col), axis=0)

your function change_date_format1 in this case should be just:

lambda x: datetime.datetime.strptime(x, '%m/%d/%Y %H:%M').strftime('%Y-%m-%d %H:%M')

Why is Pandas doing this?

Answers (1)

Related Questions