Reputation: 53
So I want to import some data from the Dutch databank CBS. I need to select all the municipalities. They all have a code that starts with GM and then 4 numbers.
Do I have to type them all in? Or is there a quicker way to get them all in in one time.
# Downloaden van selectie van data
data = pd.DataFrame(
cbsodata.get_data('70072ned',
filters="RegioS eq 'GM0003', 'GM0004', 'GM0005'",
select=['RegioS', 'Vrouwen_3', 'Mannen_2']))
print(data.head())
Upvotes: 0
Views: 153
Reputation: 5648
This looks like you can use ODATA querying. If you try the following you'll the idea how to modify the GM019 to just GM and it'll return what you need.
pd.DataFrame(
cbsodata.get_data('70072ned',
filters="startswith(RegioS, 'GM019')"))
You'll get anything starting with GM019
Same data, filtered for columns
pd.DataFrame(
cbsodata.get_data('70072ned',
filters="startswith(RegioS, 'GM019')",
select=['RegioS', 'Vrouwen_3', 'Mannen_2']))
Side note: while returning everything (no filters or select), the dataset wasn't large, but it did take a while (couple minutes) to get the data.
Upvotes: 1
Reputation: 2936
I'm not sure how cbsodata.get_data
works but it seems to me that you could generate filters
.
filters = "RegioS eq " + ", ".join(["'GM" + str(i).zfill(4) + "'" for i in range(3, 8)])
This will give you:
"RegioS eq 'GM0003', 'GM0004', 'GM0005', 'GM0006', 'GM0007'"
Which you can use as filter
variable.
Example:
filters = "RegioS eq " + ", ".join(["'GM" + str(i).zfill(4) + "'" for i in range(3, 8)])
data = pd.DataFrame(
cbsodata.get_data(
"70072ned",
filters=filters,
select=["RegioS", "Vrouwen_3", "Mannen_2"],
)
)
print(data.head())
Upvotes: 1