Rich Signell
Rich Signell

Reputation: 16395

How best to create an intake catalog from a collection of CSV files?

I'm trying to figure out the best way to create an intake catalog from a collection of CSV files, where I want each CSV file to be an individual source.

I can create a catalog.yml for one CSV by doing:

import intake
source1 = intake.open_csv('states_1.csv')
source1.name = 'states1'
with open('catalog.yml', 'w') as f:
    f.write(str(source1.yaml()))

which produces the valid:

sources:
  states1:
    args:
      urlpath: states_1.csv
    description: ''
    driver: intake.source.csv.CSVSource
    metadata: {}

but if I do

import intake
source1 = intake.open_csv('states_1.csv')
source1.name = 'states1'
source2 = intake.open_csv('states_2.csv')
source2.name = 'states2'
with open('catalog.yml', 'w') as f:
    f.write(str(source1.yaml()))
    f.write(str(source2.yaml()))

of course this fails because the catalog has a duplicate sources entry:

sources:
  states1:
    args:
      urlpath: states_1.csv
    description: ''
    driver: intake.source.csv.CSVSource
    metadata: {}
sources:
  states2:
    args:
      urlpath: states_2.csv
    description: ''
    driver: intake.source.csv.CSVSource
    metadata: {}

I'm guessing there must be a better way to go about this, like perhaps by instantiating a catalog object, adding source objects and then writing the catalog? But I couldn't find the methods to accomplish this.

What is the best practice for accomplishing this?

Upvotes: 1

Views: 237

Answers (1)

Christoph
Christoph

Reputation: 1058

Try using intake.Catalog() and adding your sources to them.

import intake

description = "Simple catalog for multiple CSV sources"
catalog = {'metadata': {'version': 1,'description': description},'sources': {}}
with open('catalog.yml', 'w') as f:
    yaml.dump(catalog, f)

# Create a catalog object
catalog = intake.open_catalog('catalog.yml')

# Define your CSV sources
source1 = intake.open_csv('states_1.csv')
source1.name = 'states1'
source2 = intake.open_csv('states_2.csv')
source2.name = 'states2'

# Add the sources to the catalog
catalog = catalog.add(source1)
catalog = catalog.add(source2)

catalog.save('catalog.yml')

Upvotes: 2

Related Questions