Fernando Silva
Fernando Silva

Reputation: 182

Tensorflow TFDV does not work with Specific NaN values

I'm using Tensorflow Data Validation to generate stats from the data and infer an schema to input in TFX.

I didn't find any option to specify the NaN values, e. g., in pandas there is a field "na_values" where it is possible to specify which value will be considered NaN when reading the data.

I've looked in the entire TFDV documentation but I didn't find it.

tfdv.generate_statistics_from_csv(
    data_location,
    column_names=None,
    delimiter=',',
    output_path=None,
    stats_options=options.StatsOptions(),
    pipeline_options=None
)

The options.StatsOptions() are options for generating statistics, such as sample_count, sample_rate and so on...

For me it doesn't make sense to read the data deal with the missing values save the data as Csv or TFRecord and after import in TFDV to generate the stats.

Upvotes: 0

Views: 353

Answers (1)

Paul Suganthan
Paul Suganthan

Reputation: 86

In TFDV 0.13.0, you can use tfdv.generate_statistics_from_dataframe method to generate statistics from a pandas Dataframe. If your data fits in-memory, you can use pandas.read_csv method to read the CSV file (by specifying na_values) and then use the above method to generate statistics.

Upvotes: 0

Related Questions