Reputation: 6776
I want to delete a Riak bucket in order to purge old data from my system. I understand that there is no single Riak API to do this, but instead one deletes all the keys in the bucket, which effectively deletes it. Riak does provide an API to fetch all the keys, so this is fairly straightforward.
I found some code online to do this, but it was written in JavaScript and runs under Node. I want something in Python. This is probably a simple thing to do. Does anyone have any examples?
Upvotes: 6
Views: 3509
Reputation: 1149
If using the python riak-client is an option for you, this can be achieved with less code:
#!/usr/bin/python
import riak
riak_handle = riak.RiakClient(pb_port=8087, protocol='pbc')
riak_bucket = riak_handle.bucket('default')
for keys in riak_bucket.stream_keys():
for key in keys:
print('Deleting %s' % key)
riak_bucket.delete(key)
You could adapt that to use arguments if that is your primary use case.
Upvotes: 2
Reputation: 6776
Like I said in the question, I figured this was pretty simple, especially with the requests library, so I developed a script to do this. I started with the Riak keys=true
(i.e. non-chunked) mode, but that failed on my larger buckets. I switched to chunked mode (keys=stream
), but the output was not a single JSON object anymore, but a series of concatenated objects (i.e. {...}{...}...{...}
. A colleague provided me with a regex to split the JSON objects out from the aggregated Riak response, which I parsed and processed sequentially. Not too bad. Here's the code:
#!/usr/bin/python
# script to delete all keys in a Riak bucket
import json
import re
import requests
import sys
def processChunk(chunk):
global key_count
obj = json.loads(chunk.group(2))
if 'keys' in obj:
for key in obj['keys']:
r = requests.delete(sys.argv[1] + '/' + key)
print 'delete key', key, 'response', r.status_code
key_count += 1
if len(sys.argv) != 2:
print 'Usage: {0} <http://riak_host:8098/riak/bucket_name>'.format(sys.argv[0])
print 'Set riak_host and bucket_name appropriately for your Riak cluster.'
exit(0)
r = requests.get(sys.argv[1] + '?keys=stream')
content = ''
key_count = 0
for chunk in r.iter_content():
if chunk:
content += chunk
re.sub(r'(?=(^|})({.*?})(?={|$))', processChunk, content)
print 'Deleted', key_count, 'keys'
While my problem is largely solved at this point, I suspect there are better solutions out there. I welcome people to add them on this page. I won't accept my own answer unless no alternatives are provided after a few weeks.
Upvotes: 11