Reputation: 8655
I have a personal repository on GitHub that is completely written in C#, with a few XML configuration files, and some PowerShell files from included NuGet packages. On the main repository page, GitHub shows a colored bar to display the breakdown of different languages used in the repository.
If you click this bar, it shows the language names and actual percents.
This particular language breakdown seems a bit odd to me, since I am the only contributor, and I have never used Smalltalk.
If you click a language name, it will show you a list of the files using that language.
In this last image, you can see on the left side that the repository really only contains C#, XML, PowerShell, text and markdown files.
So why does GitHub think I'm using Smalltalk? And why doesn't the color bar mention that I'm using XML?
Upvotes: 1
Views: 634
Reputation: 13063
As Philip and VonC noted, GitHub uses Linguist to compute the language statistics.
So why does GitHub think I'm using Smalltalk?
Linguist relies first on the file extension to determine the language of a file. It then uses a set of refinement strategies for conflicting extensions (e.g., .cs
is used by both Smalltalk and C#). These refinement strategies are not 100% accurate (in can even get pretty bad for small files). Thus, files with conflicting extensions may be classified incorrectly.
How can I fix it?
You can use Linguist overrides to tell Linguist that all .cs
files in your repository are C# with a gitattributes
file:
*.cs linguist-language=C#
And why doesn't the color bar mention that I'm using XML?
Linguist only counts programming and markup languages in the statistics. XML is classified as a data language.
Why doesn't Smalltalk appear in the search results?
The search results are cached to avoid computing them every time you visit the page. They probably weren't up-to-date when you took the screenshot.
Upvotes: 2
Reputation: 1532
GitHub uses a heuristic to identify the language(s) of your repository. The underlying library is linguist. Misclassification is common enough that it's the top Troubleshooting section: My repository is detected as the wrong language.
Upvotes: 1
Reputation: 1323753
Since GitHub is using linguist to detect languages, you can open a PR to report some files incorrectly tagged as "Smalltalk".
For instance, issue 2012 is still active (even though it is closed).
Upvotes: 0