feathj
feathj

Reputation: 3069

AWS dynamodb support for "R" programming language

Has anyone been able to successfully CRUD records in amazon dynamodb using the R programming language? I found this reference of language bindings supported:

http://aws.typepad.com/aws/2012/04/amazon-dynamodb-libraries-mappers-and-mock-implementations-galore.html

Alas, no R. We are considering using dynamodb for a large scale data project, but our main analyst is most comfortable in R, so we are exploring our options.

Upvotes: 7

Views: 3292

Answers (4)

David Kretch
David Kretch

Reputation: 390

For anyone who comes across this, there is now the Paws package, an AWS SDK for R. You can install it with install.packages("paws").

Disclaimer: I am a maintainer of the Paws package.

For example:

# Create a client object.
svc <- paws::dynamodb()

# This example retrieves an item from the Music table. The table has a
# partition key and a sort key (Artist and SongTitle), so you must specify
# both of these attributes.
item <- svc$get_item(
  Key = list(
    Artist = list(
      S = "Acme Band"
    ),
    SongTitle = list(
      S = "Happy Day"
    )
  ),
  TableName = "Music"
)

# This example adds a new item to the Music table.
svc$put_item(
  Item = list(
    AlbumTitle = list(
      S = "Somewhat Famous"
    ),
    Artist = list(
      S = "No One You Know"
    ),
    SongTitle = list(
      S = "Call Me Today"
    )
  ),
  ReturnConsumedCapacity = "TOTAL",
  TableName = "Music"
)

Upvotes: 2

Zerodf
Zerodf

Reputation: 2298

Cloudyr's aws.dynamodb is convenient for reading data from DynamoDB. However, it has an unfortunately tendency to coerce things to characters. Also, I have had trouble using the put_item function to add anything but string data to DynamoDB.

AWS CLI works well. Example here:

$ aws dynamodb put-item --table-name "SOMETABLE" --item '{"aStringItem": {"S": "1900-01-02|myid"}, "aNumericItem": {"N": "2"}, "aMapItem": {"M": {"Source": {"S": "CLI"}}}}'

Two other options that haven't been mentioned are Rcpp and rJava. There are native SKDs available in both Java and C++.

Upvotes: 1

CalZ
CalZ

Reputation: 161

Here's a simplified version of what I'm using for reading data from DynamoDB into R. It relies on the fact that R and Python can exchange data, and a library called boto in Python makes it really easy to get data from DynamoDB. It would be neat if this was all an R package, but I won't complain given the 25GB of free storage you can get from Amazon.

First, you need a Python script like so named query_dynamo.py:

import boto3
import time

dynamodb = boto3.resource('dynamodb',
                          aws_access_key_id='<GET ME FROM AWS>',
                          aws_secret_access_key='<ALSO GET ME FROM AWS CONSOLE>',
                          region_name='us-east-1')

table = dynamodb.Table('comment')  ###Your table name in DynamoDB here

response = table.scan()
data = response['Items']

while 'LastEvaluatedKey' in response:
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
    data.extend(response['Items'])

Then in R you do this. If you're trying this on Windows, you may want to try rPython-win instead. I did all this on Ubuntu Linux 16.04 LTS.

library(rPython)


python.load("query_dynamo.py")
temp = as.data.frame(python.get('data'))
df = as.data.frame(t(temp))
rm(temp)

Now you'll have a dataframe called "df" with the contents of whatever you put in DynamoDB.

Upvotes: 3

Julio Faerman
Julio Faerman

Reputation: 13501

There are several approaches to this... let me add two:

1- EMR with Hive and Streaming.

Hive would be used to query DynamoDB and that could be used as an input to Haddop Streaming, that can be used with any language that can read and write from standard IO, including R.

Of course that would be very different from your typical R program and environment, but it would leverage the "big data" tools.

2- R-ish in the JVM

If you use an R interpreter for the JVM (such as Renjin) or an similar language in the JVM, you would be able to use the AWS Java SDK and DynamoDB libraries directly, and that might be much more familiar to the developer, but you'd be responsible for handling the "bigness" of your data.

Upvotes: 1

Related Questions