Reputation: 136

where does V8 actually consume the raw javascript code?

Im trying to understand how v8 works but im unable to locate where in the code does it actually get the input raw js script to parse it and compile it into c++.

I've seen the api.cc and tried to set up a breakpoint in the compiler function but with no luck (im using chromium to do so), it never hits this function.

MaybeLocal<Script> ScriptCompiler::Compile(Local<Context> context,
                                      Source* source,
                                      CompileOptions options,
                                      NoCacheReason no_cache_reason)

***** UPDATE ****

After @jmrk reply I've been trying to figure out where does the JS actually start coming in, what im really interested in is understanding how a website renders and then passes the script into the V8 for it to compile. I have found quite a lot of information on the topic but im still unable to understand the whole picture:

Turns out the first step isn't the Parser but the Scanner, which gets a UTF-16 stream as an input.

The source code is first broken up in chunks; each chunk may be associated with a different encoding. A stream then unifies all chunks under the UTF-16 encoding.

Prior to parsing, the scanner then breaks up the UTF-16 stream into tokens. A token is the smallest unit of a script that has semantic meaning. There are several categories of tokens, including whitespace (used for automatic semicolon insertion), identifiers, keywords, and surrogate pairs (combined to make identifiers only when the pair is not recognized as anything else). These tokens are then fed first to the preparser and then to the parser.

https://blog.logrocket.com/how-javascript-works-optimizing-for-parsing-efficiency/

I have also found out it indeed gets this stream from Blink:

he UTF16CharacterStream provides a (possibly buffered) UTF-16 view over the underlying Latin1, UTF-8, or UTF-16 encoding that V8 receives from Chrome, which Chrome in turn received from the network. In addition to supporting more than one encoding, the separation between scanner and character stream allows V8 to transparently scan as if the entire source is available, even though we may only have received a portion of the data over the network so far.

https://v8.dev/blog/scanner

It also seems like the scanner feeds tokens to the parser:

V8’s parser consumes ‘tokens’ provided by the ‘scanner’. Tokens are blocks of one or more characters that have a single semantic meaning: a string, an identifier, an operator like ++. The scanner constructs these tokens by combining consecutive characters in an underlying character stream.

But the question remains, where is the Javascript raw code coming in from blink into V8? How can I see what chrome reads and where does it initialize v8?

Upvotes: 3

Answers (1)

jmrk

Reputation: 40491

It's complicated :-)

ScriptCompiler::Compile is generally correct as the outermost entrypoint. Note that there are two overloads of it. Additionally, Chrome tries to do streaming compilation when it can, which takes a different path. Also, when working with Chrome/Chromium, note that you have to set the breakpoints in the renderer processes, not the browser process.

It's easier to work with the d8 shell when poking around V8. Look for Shell::ExecuteString (which calls ScriptCompiler::Compile) in d8.cc.

Also, to clarify, V8 does not compile JavaScript to C++. It compiles it first to its own internal bytecode format which is executed by the "Ignition" interpreter; hot functions are then later compiled to machine code by the "Turbofan" optimizing compiler.

Don't be discouraged if you have trouble understanding the whole pipeline. No single person does; V8 is too big and too complicated for that. Focus on what you're interested in (parser? interpreter? optimizing compiler?) and dig into that.

Upvotes: 6

where does V8 actually consume the raw javascript code?

Answers (1)

Related Questions