srcerer
srcerer

Reputation: 1098

Change dtypes in Pandas DataFrame cell-by-cell

Problem:

I have a Pandas.DataFrame which stores only unicode values. Each column contains values that could be converted to either an integer or float, or left as unicode. (Python version 2.7.15, Pandas version 0.23.0)

df = pd.DataFrame({'x':[u'1', u'1.23', u'', u'foo_text'], 'y':[u'bar_text', u'', u'2', u'4.56']})
print df
          x         y
0         1  bar_text
1      1.23
2                   2
3  foo_text      4.56

I would like to convert the type of each cell as follows:

  1. If possible to convert to int, convert to int
  2. Else if possible to convert to float, convert to float
  3. Else, leave as unicode

Solution attempts:

The following code does precisely what I want:

type_list = [int, float, unicode]
for column in df.columns:
    for index in df.index:
        for desired_type in type_list:
            try:
                df.loc[index,column] = desired_type(df.loc[index,column])
                break
            except ValueError:
                pass

The problem is that my actual DataFrame is >10 million cells, and this will way too long to execute. I am trying to find a faster way to do this.

I have looked at pandas.DataFrame.infer_objects(), and pandas.to_numeric(), but neither appear to handle the case of mixed types within a column.

Upvotes: 0

Views: 4249

Answers (1)

Jonathan Nelson
Jonathan Nelson

Reputation: 124

Try using a function along with .apply() which will be a lot faster than three nested for-loops.

So something like:

def change_dtype(value):
    try:
        return int(value)
    except ValueError:
        try:
            return float(value)
        except ValueError:
            return value

for column in df.columns:
    df.loc[:, column] = df[column].apply(change_dtype)

Upvotes: 3

Related Questions