newspaint

Documenting Problems That Were Difficult To Find The Answer To

Getting To The Bottom Of Why A PhantomJS Page Load Fails

For this post I’m using PhantomJS version 1.9.

Quite frustratingly I occasionally have a call to page.open() where my callback receives a status of “fail”. This isn’t very helpful as it doesn’t describe what went wrong. Was it a SSL handshake problem (using the --ignore-ssl-errors=true command line argument may solve such problems)? Something else?

Unfortunately the PhantomJS API, at present, doesn’t appear to have an ability to determine the reason for the failure of the page to load. But there are a number of callbacks we can hook into to generate a lot of debugging messages to allow us to determine the reason for the failure.

Simplified Reason Tracking

Just before calling page.open() add the following code (after creating the page variable):

    page.onResourceError = function(resourceError) {
        page.reason = resourceError.errorString;
        page.reason_url = resourceError.url;
    };

Now you can print out the reason for a problem in your page.open() callback, e.g.:

var page = require('webpage').create();

page.onResourceError = function(resourceError) {
    page.reason = resourceError.errorString;
    page.reason_url = resourceError.url;
};

page.open(
    "http://www.nosuchdomain/",
    function (status) {
        if ( status !== 'success' ) {
            console.log(
                "Error opening url \"" + page.reason_url
                + "\": " + page.reason
            );
            phantom.exit( 1 );
        } else {
            console.log( "Successful page open!" );
            phantom.exit( 0 );
        }
    }
);

This script outputs the following:

Error opening url "http://www.nosuchdomain/": Host www.nosuchdomain not found

Detailed Logging

Just before calling page.open() add the following code (after creating the page variable):

    page.onResourceRequested = function (request) {
        system.stderr.writeLine('= onResourceRequested()');
        system.stderr.writeLine('  request: ' + JSON.stringify(request, undefined, 4));
    };

    page.onResourceReceived = function(response) {
        system.stderr.writeLine('= onResourceReceived()' );
        system.stderr.writeLine('  id: ' + response.id + ', stage: "' + response.stage + '", response: ' + JSON.stringify(response));
    };

    page.onLoadStarted = function() {
        system.stderr.writeLine('= onLoadStarted()');
        var currentUrl = page.evaluate(function() {
            return window.location.href;
        });
        system.stderr.writeLine('  leaving url: ' + currentUrl);
    };

    page.onLoadFinished = function(status) {
        system.stderr.writeLine('= onLoadFinished()');
        system.stderr.writeLine('  status: ' + status);
    };

    page.onNavigationRequested = function(url, type, willNavigate, main) {
        system.stderr.writeLine('= onNavigationRequested');
        system.stderr.writeLine('  destination_url: ' + url);
        system.stderr.writeLine('  type (cause): ' + type);
        system.stderr.writeLine('  will navigate: ' + willNavigate);
        system.stderr.writeLine('  from page\'s main frame: ' + main);
    };

    page.onResourceError = function(resourceError) {
        system.stderr.writeLine('= onResourceError()');
        system.stderr.writeLine('  - unable to load url: "' + resourceError.url + '"');
        system.stderr.writeLine('  - error code: ' + resourceError.errorCode + ', description: ' + resourceError.errorString );
    };

    page.onError = function(msg, trace) {
        system.stderr.writeLine('= onError()');
        var msgStack = ['  ERROR: ' + msg];
        if (trace) {
            msgStack.push('  TRACE:');
            trace.forEach(function(t) {
                msgStack.push('    -> ' + t.file + ': ' + t.line + (t.function ? ' (in function "' + t.function + '")' : ''));
            });
        }
        system.stderr.writeLine(msgStack.join('\n'));
    };

It is important that before this block gets called after the page and system variables are defined, e.g.:

var system = require('system');
var page = require('webpage').create();

6 responses to “Getting To The Bottom Of Why A PhantomJS Page Load Fails

  1. sdf January 18, 2014 at 6:05 pm

    thanks for –ignore-ssl-errors=true

  2. Marco Succi August 1, 2014 at 7:48 am

    Thanks so much! It’s not so easy to find answer to this kind of problems…;)

  3. bethesk October 29, 2014 at 4:54 am

    Thanks, this saved us oooodles of time!

  4. Ambush December 20, 2014 at 2:17 am

    +1 for –ignore-ssl-errors=true. Apparently a known problem for PhantomJS.

  5. Chao Fang August 9, 2016 at 12:37 am

    Both –ignore-ssl and error handling helped greatly to increase the speed of testing and debugging. Thanx a lot,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: