How to enable header in text files of load_dataset in huggingface?

from datasets import load_dataset
dataset = load_dataset('text', data_files='my_file.txt')

This text file already contains headers, how do I indicate this to the module (say, header = True, in case of pandas read_csv())?

Also, how do I mention that it is tab/comma separated?

Is there a way to present this data in tabular format?

Upvotes: 0

Answers (3)

Reputation: 9678

You can read it by pandas and then convert to a dataset:

  from datasets import Dataset
  import pandas as pd

  df = pd.read_table(path)
  ds = Dataset.from_pandas(df)

Upvotes: 0

Reputation: 572

This is an old question, but for newcomers:

Read tsv file:

from datasets import load_dataset

dataset = load_dataset("csv", 'path/to/your/file.tsv', delimiter='\t')

By default, it will infer the column names from the first line.

If your file don't have a header line and you want to specify the column names, use

column_names = ['col1', 'col2', 'col3']
dataset = load_dataset("csv", 'path/to/your/file.tsv', delimiter='\t', column_names=column_names)

Upvotes: 2

Reputation: 19510

They are using pandas.read_csv() and you can pass parameters through load_dataset:

from datasets import load_dataset

a = load_dataset("csv", data_files="bla.tsv", sep="\t")

Upvotes: 0