Reputation: 16395
I'm trying to figure out the best way to create an intake catalog from a collection of CSV files, where I want each CSV file to be an individual source
.
I can create a catalog.yml
for one CSV by doing:
import intake
source1 = intake.open_csv('states_1.csv')
source1.name = 'states1'
with open('catalog.yml', 'w') as f:
f.write(str(source1.yaml()))
which produces the valid:
sources:
states1:
args:
urlpath: states_1.csv
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
but if I do
import intake
source1 = intake.open_csv('states_1.csv')
source1.name = 'states1'
source2 = intake.open_csv('states_2.csv')
source2.name = 'states2'
with open('catalog.yml', 'w') as f:
f.write(str(source1.yaml()))
f.write(str(source2.yaml()))
of course this fails because the catalog has a duplicate sources
entry:
sources:
states1:
args:
urlpath: states_1.csv
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
sources:
states2:
args:
urlpath: states_2.csv
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
I'm guessing there must be a better way to go about this, like perhaps by instantiating a catalog object, adding source objects and then writing the catalog? But I couldn't find the methods to accomplish this.
What is the best practice for accomplishing this?
Upvotes: 1
Views: 237
Reputation: 1058
Try using intake.Catalog() and adding your sources to them.
import intake
description = "Simple catalog for multiple CSV sources"
catalog = {'metadata': {'version': 1,'description': description},'sources': {}}
with open('catalog.yml', 'w') as f:
yaml.dump(catalog, f)
# Create a catalog object
catalog = intake.open_catalog('catalog.yml')
# Define your CSV sources
source1 = intake.open_csv('states_1.csv')
source1.name = 'states1'
source2 = intake.open_csv('states_2.csv')
source2.name = 'states2'
# Add the sources to the catalog
catalog = catalog.add(source1)
catalog = catalog.add(source2)
catalog.save('catalog.yml')
Upvotes: 2