michaelAdam
michaelAdam

Reputation: 1137

How can I package or install an entire program to run in an AWS Lambda function

If this is a case of using Lambda entirely the wrong way, please let me know.

I want to install Scrapy into a Lambda function and invoke the function to begin a crawl. My first problem is how to install it, so that all of the paths are correct. I installed the program using the directory to be zipped as its root, so the zip contains all of the source files and the executable. I am basing my efforts on this article. In the line it says to include at the beginning of my function, where does the "process" variable come from? I have tried,

var process = require('child_process');
var exec = process.exec;
process.env['PATH'] = process.env['PATH'] + ':' + 
process.env['LAMBDA_TASK_ROOT']

but I get the error,

"errorMessage": "Cannot read property 'PATH' of undefined",
"errorType": "TypeError",

Do I need to include all of the library files, or just the executable from /usr/lib ? How do I include that one line of code the article says I need?

Edit: I tried moving the code into a child_process.exec, and received the error

"errorMessage": "Command failed: /bin/sh: process.env[PATH]: command not found\n/bin/sh: scrapy: command not found\n"

Here is my current, entire function

console.log("STARTING");
var process = require('child_process');
var exec = process.exec;

exports.handler = function(event, context) {    
    //Run a fixed Python command.
    exec("process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT']; scrapy crawl backpage2", function(error, stdout) {
        console.log('Scrapy returned: ' + stdout + '.');
        context.done(error, stdout);
    });

};

Upvotes: 6

Views: 5534

Answers (3)

toske
toske

Reputation: 1754

Problem with your example is modifying node global variable process by

var process = require('child_process');

This way you can't alter PATH environment variable, and thus the reason why you are getting Cannot read property 'PATH' of undefined.

Just use different name for loaded child_process library, e.g.

//this gets updated to child_process
var child_process = require('child_process');
var exec = child_process.exec;

//global process variable is still accessible
process.env['PATH'] = process.env['PATH'] + ':' + 
process.env['LAMBDA_TASK_ROOT']

//start new process with your binary
exec('path/to/your/binary',......

Upvotes: 4

Clay Fowler
Clay Fowler

Reputation: 2078

You can definitely execute arbitrary processes from your Node code in Lambda. We're doing this to run a game server that processes player turns, for example. I'm not sure exactly what you're trying to accomplish with Scrapy, but remember that your entire Lambda invocation can only live for a maximum of 60 seconds right now on AWS! But if that's OK for you, here is a completely working example of how we are executing our own arbitrary linux process from Lambda. (In our case, it's a compiled binary - really doesn't matter as long as you have something that can run on the Linux image they use).

var child_process = require('child_process');
var path = require('path');

exports.handler = function (event, context) {
    // If timeout is provided in context, get it. Otherwise, assume 60 seconds
    var timeout = (context.getRemainingTimeInMillis && (context.getRemainingTimeInMillis() - 1000)) || 60000;
    // The task root is the directory with the code package.
    var taskRoot = process.env['LAMBDA_TASK_ROOT'] || __dirname;
    // The command to execute.
    var command;

    // Set up environment variables
    process.env.HOME = '/tmp'; // <-- for naive processes that assume $HOME always works! You might not need this.

    // On linux the executable is in task root / __dirname, whichever was defined
    process.env.PATH += ':' + taskRoot;
    command = 'bash -c "cp -R /var/task/YOUR_THING /tmp/; cd /tmp; ./YOUR_THING ARG1 ARG2 ETC"'

    child_process.exec(command, {
        timeout: timeout,
        env: process.env
    }, function (error, stdout, stderr) {
        console.log(stdout);
        console.log(stderr);
        context.done(null, {exit: true, stdout: stdout, stderr: stderr});
    });
};

Upvotes: 7

James
James

Reputation: 11931

Here is a Lambda function that runs a python script setting the current working directory to the same directory as the Lambda function. You may be able to use this with some modifications to the relative location of your python script.

var child_process = require("child_process");

exports.handler = function(event, context) {
    var execOptions = {
        cwd: __dirname
    };
    child_process.exec("python hello.py", execOptions, function (error, stdout, stderr) {
        if (error) {
            context.fail(error);
        } else {
            console.log("stdout:\n", stdout);
            console.log("stderr:\n", stderr);
            context.succeed();
        }
    });
};

Upvotes: 1

Related Questions