Reputation: 3176
Lots of editors and IDEs have code completion. Some of them are very "intelligent" others are not really. I am interested in the more intelligent type. For example I have seen IDEs that only offer a function if it is a) available in the current scope b) its return value is valid. (For example after "5 + foo[tab]" it only offers functions that return something that can be added to an integer or variable names of the correct type.) I have also seen that they place the more often used or longest option ahead of the list.
I realize you need to parse the code. But usually while editing the current code is invalid there are syntax errors in it. How do you parse something when it is incomplete and contains errors?
There is also a time constraint. The completion is useless if it takes seconds to come up with a list. Sometimes the completion algorithm deals with thousands of classes.
What are the good algorithms and data structures for this?
Upvotes: 93
Views: 19628
Reputation: 64
The following link will help you further..
Syntax Highlighting:Fast Colored TextBox for Syntax Highlighting
Upvotes: 1
Reputation: 99869
The IntelliSense engine in my UnrealScript language service product is complicated, but I'll give as best an overview here as I can. The C# language service in VS2008 SP1 is my performance goal (for good reason). It's not there yet, but it's fast/accurate enough that I can safely offer suggestions after a single character is typed, without waiting for ctrl+space or the user typing a .
(dot). The more information people [working on language services] get about this subject, the better end-user experience I get should I ever use their products. There are a number of products I've had the unfortunate experience of working with that didn't pay such close attention to details, and as a result I was fighting with the IDE more than I was coding.
In my language service, it's laid out like the following:
aa.bb.cc
, but can also contain method calls as in aa.bb(3+2).cc
.IDeclarationProvider
, where you can call GetDeclarations()
to get an IEnumerable<IDeclaration>
of all items visible in the scope. In my case, this list contains the locals/parameters (if in a method), members (fields and methods, static only unless in an instance method, and no private members of base types), globals (types and constants for the language I'm working on), and keywords. In this list will be an item with the name aa
. As a first step in evaluating the expression in #1, we select the item from the context enumeration with the name aa
, giving us an IDeclaration
for the next step.IDeclaration
representing aa
to get another IEnumerable<IDeclaration>
containing the "members" (in some sense) of aa
. Since the .
operator is different from the ->
operator, I call declaration.GetMembers(".")
and expect the IDeclaration
object to correctly apply the listed operator.cc
, where the declaration list may or may not contain an object with the name cc
. As I'm sure you're aware, if multiple items begin with cc
, they should appear as well. I solve this by taking the final enumeration and passing it through my documented algorithm to provide the user with the most helpful information possible.Here are some additional notes for the IntelliSense backend:
GetMembers
. Each object in my cache is able to provide a functor that evaluates to its members, so performing complicated actions with the tree is near trivial.List<IDeclaration>
of its members, I keep a List<Name>
, where Name
is a struct containing the hash of a specially-formatted string describing the member. There's an enormous cache that maps names to objects. This way, when I re-parse a file, I can remove all items declared in the file from the cache and repopulate it with the updated members. Due to the way the functors are configured, all expressions immediately evaluate to the new items.IntelliSense "frontend"
As the user types, the file is syntactically incorrect more often than it is correct. As such, I don't want to haphazardly remove sections of the cache when the user types. I have a large number of special-case rules in place to handle incremental updates as quickly as possible. The incremental cache is only kept local to an open file and helps make ensure the user doesn't realize that their typing is causing the backend cache to hold incorrect line/column information for things like each method in the file.
Code snippet for the previous section:
class A
{
int x; // linked to A
void foo() // linked to A
{
int local; // linked to foo()
// foo() ends here because bar() is starting
void bar() // linked to A
{
int local2; // linked to bar()
}
int y; // linked again to A
I figured I'd add a list of the IntelliSense features I've implemented with this layout. Pictures of each are located here.
Upvotes: 71
Reputation: 400274
I can't say exactly what algorithms are used by any particular implementation, but I can make some educated guesses. A trie is a very useful data structure for this problem: the IDE can maintain a large trie in memory of all of the symbols in your project, with some extra metadata at each node.
When you type a character, it walks down a path in the trie. All of the descendants of a particular trie node are possible completions. The IDE then just needs to filter those out by the ones that make sense in the current context, but it only needs to compute as many as can be displayed in the tab-completion pop-up window.
More advanced tab-completion requires a more complicated trie. For example, Visual Assist X has a feature whereby you only need to type the capital letters of CamelCase symbols -- e.g., if you type SFN, it shows you the symbol SomeFunctionName
in its tab-completion window.
Computing the trie (or other data structures) does require parsing all of your code to get a list of all of the symbols in your project. Visual Studio stores this in its IntelliSense database, an .ncb
file stored alongside your project, so that it doesn't have to reparse everything every time you close and reopen your project. The first time you open a large project (say, one you just synced form source control), VS will take the time to parse everything and generate the database.
I don't know how it handles incremental changes. As you said, when you're writing code, it's invalid syntax 90% of the time, and reparsing everything whenever you idled would put a huge tax on your CPU for very little benefit, especially if you're modifying a header file included by a large number of source files.
I suspect that it either (a) only reparses whenever you actually build your project (or possibly when you close/open it), or (b) it does some sort of local parsing where it only parses the code around where you've just edited in some limited fashion, just to get the names of the relevant symbols. Since C++ has such an outstandingly complicated grammar, it may behave oddly in the dark corners if you're using heavy template metaprogramming and the like.
Upvotes: 17