Console messages from the webpage aren't discarded by default


#1

The documentation claims that “by default, console messages from the web page are not displayed.” This is incorrect; by default, they are written to stdout, which means they can get mixed up with other things being written to stdout, like, for instance, the text of the web page that you are trying to download.

Making matters worse, there are a whole bunch of console-message-related hook functions, all of which need to be overridden to prevent any junk getting written to stdout. And even worse, pop-up windows do not inherit their page hooks from the parent page, so you have to know to set them all in an onPageCreated hook, like this:

// Consume all console messages and pop-ups.
function p_onConsoleMessage (msg, linenum, sourceid) {}
function p_onError (msg, trace) {}
function p_onAlert (msg) {}
function p_onConfirm (msg) { return true; }
function p_onPrompt (msg) { return "fuzzy wuzzy"; }

// Set all hooks on (sub)page creation.
function p_onPageCreated(page) {
    page.onAlert               = p_onAlert;
    page.onConfirm             = p_onConfirm;
    page.onPrompt              = p_onPrompt;
    page.onConsoleMessage      = p_onConsoleMessage;
    page.onError               = p_onError;
    page.onPageCreated         = p_onPageCreated;
}

// Load a page and dump its contents to stdout.
var webpage = require("webpage");
var system = require("system");

if (system.args.length != 2) {
    console.error("usage: phantomjs", system.args[0], "URL");
    phantom.exit(1);
}

var page = webpage.create();
p_onPageCreated(page);

page.onLoadFinished = function (status) {
    if (status !== "success") {
        console.error("page load failed:", status);
        phantom.exit(1);
    }
    system.stdout.writeLine(page.content);
    phantom.exit(0);
}

page.open(system.args[1]);

I’d like to propose some changes to make this better. Some of these could break existing scripts, so we need to be cautious, but I think all of them are abstractly a good idea.

  1. The default message handlers should write to stderr, not stdout. The documentation should be corrected to match.
  2. There should be a command-line option (--quiet, perhaps) that causes all console messages to be discarded.
  3. Whenever a new page is created for any reason, all of its page hooks should inherit from the parent page, if any. (Caution: page hooks should not inherit from the pseudopage that wraps the outer script environment.)
  4. Unless overridden by bind, all page hooks should have their this set to the page object.
  5. The documentation should make clear whether or not onConsoleMessage is called for all types of console message.

#2

Could you please open PR for this on our repo (GitHub)? Thanks.


#3

I don’t use PhantomJS anymore, and I don’t know which Github repo is current anymore. Please go ahead and file the bug yourself.


#4

our repo is here. May I ask you why you have stopped using PhantomJS?


#5

it is no longer sufficiently up-to-date with web standards, and too many sites are specifically detecting and blocking it.


#6

Thank you so much! I’ve been struggling with getting javascript errors within a page loaded from within PhantomJS. Your code is an excellent springboard to work from.

If you don’t mind, what are you using instead of PhantomJS?