read_csv converters for unknown columns

Question

I'm trying to read a csv file that holds several values in every cell and I want to encode them to a single int formatted byte to be stored in a pandas cell, (e.g. (1, 1) -> 771). For that I would like to use the converters parameter of the read_csv function. The problem is that I don't know the names of the columns before hand and the value to be passed to the converters should be a dict with the column names as keys. In fact I want to convert all columns with the same converter function. For that it would be better to write:

read_csv(fhand, converter=my_endocing_function)

than:

read_csv(fhand, converters={'col1':my_endocing_function,
                            'col2':my_endocing_function,
                            'col3':my_endocing_function,})

Is something like that possible? Right now to solve the issue I'm doing:

dataframe = read_csv(fhand)
enc_func = numpy.vectorize(encoder.encode_genotype)
dataframe = dataframe.apply(enc_func, axis=1)

But I guess that this approach might be less efficient. By the way I have similar doubts with the formatters used by the to_string method.

Wes McKinney · Accepted Answer

You can pass integers (0, 1, 2) instead of the names. From the docstring:

converters : dict. optional
    Dict of functions for converting values in certain columns. Keys can either
    be integers or column labels

read_csv converters for unknown columns

Answers (1)

Related Questions