Just a learner
Just a learner

Reputation: 28642

How to search on GitHub to get exact string matches, including special characters

I can search exact matches from Google by using quotes like "system <<-".

How can I do the same thing for GitHub?

Upvotes: 492

Views: 273389

Answers (13)

Abhinav Srivastava
Abhinav Srivastava

Reputation: 1

import requests
import re
from datetime import datetime

# Replace with your organization name and GitHub PAT
GITHUB_ORG = "your-org-name"
GITHUB_TOKEN = "your-github-token"
API_URL = f"https://api.github.com/orgs/{GITHUB_ORG}/repos"

# Headers for API authentication
HEADERS = {
    "Authorization": f"Bearer {GITHUB_TOKEN}"
}

# Configurable date (ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ)
LAST_COMMIT_DATE = "2024-01-01T00:00:00Z"

def get_public_repos():
    """Fetch all public repositories for the organization."""
    repos = []
    page = 1
    while True:
        response = requests.get(f"{API_URL}?per_page=100&page={page}", headers=HEADERS)
        if response.status_code != 200:
            print(f"Error fetching repos: {response.json()}")
            break
        data = response.json()
        if not data:
            break
        repos.extend(data)
        page += 1
    return repos

def filter_repos_with_main_and_date(repos):
    """Filter repositories with a main branch and commits after a certain date."""
    filtered_repos = []
    for repo in repos:
        default_branch = repo.get("default_branch")
        if default_branch != "main":
            continue

        commits_url = repo["commits_url"].replace("{/sha}", "")
        response = requests.get(f"{commits_url}?per_page=1", headers=HEADERS)
        if response.status_code != 200:
            print(f"Error fetching commits for {repo['name']}: {response.json()}")
            continue

        commits = response.json()
        if not commits:
            continue

        latest_commit_date = commits[0]["commit"]["committer"]["date"]
        if latest_commit_date > LAST_COMMIT_DATE:
            filtered_repos.append(repo)
    return filtered_repos

def get_jenkinsfile_content(repo_name):
    """Fetch the content of Jenkinsfile if it exists in the main branch."""
    url = f"https://api.github.com/repos/{GITHUB_ORG}/{repo_name}/contents/Jenkinsfile?ref=main"
    response = requests.get(url, headers=HEADERS)
    if response.status_code == 200:
        content = response.json().get("content")
        return content.encode('ascii') if content else None
    return None

def find_java_version(jenkinsfile_content):
    """Extract Java version from the Jenkinsfile content."""
    try:
        content = jenkinsfile_content.decode("base64").decode("utf-8")
        # Regex pattern to match Java version declarations
        java_version_pattern = r"(?:JAVA_HOME|jdk\s*=|java\s*['\"]?\d+).*?(\d+)"
        matches = re.findall(java_version_pattern, content, re.IGNORECASE)
        return matches if matches else "No Java version found"
    except Exception as e:
        print(f"Error decoding Jenkinsfile content: {e}")
        return "Error decoding"

def main():
    repos = get_public_repos()
    print(f"Found {len(repos)} public repositories.")

    filtered_repos = filter_repos_with_main_and_date(repos)
    print(f"{len(filtered_repos)} repositories have a main branch and commits after {LAST_COMMIT_DATE}.")

    for repo in filtered_repos:
        repo_name = repo["name"]
        print(f"Checking repository: {repo_name}")
        jenkinsfile_content = get_jenkinsfile_content(repo_name)
        if jenkinsfile_content:
            java_versions = find_java_version(jenkinsfile_content)
            print(f"Java version(s) in {repo_name}: {java_versions}")
        else:
            print(f"No Jenkinsfile found in {repo_name}")

if __name__ == "__main__":
    main()

Upvotes: 0

Matthew Read
Matthew Read

Reputation: 1879

You can now do regex searches in GitHub using forward slashes rather than quotes, so you can match both exact strings and patterns. Try the search /system <<-/ for an exact match, or /system[\s]*<<-/ for any number of whitespace characters in the middle, for example!

This was a new option included in the 2023 rollout of GitHub's Code Search feature. See the announcement blog post and the official syntax documentation.

Upvotes: 4

DenisKolodin
DenisKolodin

Reputation: 15141

You couldn't (before 2022). The official GitHub searching rules:

Due to the complexity of searching code, there are a few restrictions on how searches are performed:

  • Only the default branch is considered. In most cases, this will be the master branch.
  • Only files smaller than 384 KB are searchable.
  • Only repositories with fewer than 500,000 files are searchable.
  • You must always include at least one search term when searching source code. For example, searching for language:go is not valid, while amazing language:go is.
  • At most, search results can show two fragments from the same file, but there may be more results within the file.
  • You can't use the following wildcard characters as part of your search query:
    . , : ; / \ ` ' " = * ! ? # $ & + ^ | ~ < > ( ) { } [ ]
    The search will simply ignore these symbols.

Update: GitHub supports literal strings now, but you can also try some more powerful ways below.


Try Sourcegraph

For complex search with regex support try Sourcegraph.

enter image description here


Clone and use git-grep:

git support searching in sources with git-grep command. Just clone a repository and use the command in the folder:

git grep "text-to-search"

Alternatives:

I recommend you to try ripgrep tool, it's fast and simple. Works like git-grep but looks nicer:

rg "text-to-search"

And you can use the standard grep to search any text in files:

grep -r "text-to-search" /repository

Upvotes: 263

Lukasz Dynowski
Lukasz Dynowski

Reputation: 13700

The searching query MUST be wrapped between / (slashes)

Example 1

Search for occurrence of query = """, will look like this /query = """/

Example 2

Search for query = """ in test_*.py files, will look like this path:**/test_*.py /query = """/

Notes

Use GitHub Search or Advanced GitHub Search - be aware that the advanced search might still generate some unrecognized qualifiers (e.g. filename:)

Upvotes: 1

Jacob Archambault
Jacob Archambault

Reputation: 1036

As of 11/2/2021, this is possible by putting quotation marks around your search string

Without quotes: Searching chaos monkey on GitHub with unquoted terms

With quotes: Searching chaos monkey on GitHub with string

While it's now possible to search exact strings, the functionality doesn't yet support searching on non-alphanumeric characters. Example:

Searching chaos monkey on GitHub with question mark in quoted string

Upvotes: 3

Draex_
Draex_

Reputation: 3484

  1. Open a repository on GitHub, for example microsoft/fluentui
  2. Press dot "." to open VS Code web interface
  3. Go to search in the left panel
  4. Enable indexing via the prompt below search bar
  5. Huraaay! 🎉 exact search works

UPDATE: As of November 2022, the solution above only works if you are signed in on GitHub.

You can enable preview of new search experience on this link: https://github.com/features/code-search-code-view/signup.

Then do exact match just by using quotes: "system <<-"

Upvotes: 26

VonC
VonC

Reputation: 1329592

You can: Since Dec. 2021, your search, done from cs.github.com, can include special characters

Improving GitHub code search

(from Pavel Avgustinov)

Search for an exact string, with support for substring matches and special characters, or use regular expressions (enclosed in / separators).

So "system <<-" should work, on that new search site.

Upvotes: 16

garzj
garzj

Reputation: 3274

If you quickly want to search inside a specific repo, try that:

  • Press . while viewing the repo to open it inside a browser-based VS Code window
  • Enter your search term into the menu on the left
  • Enable indexing
    enter image description here

Upvotes: -1

Jon Schneider
Jon Schneider

Reputation: 27003

If your search term is a filename or other substring which contains punctuation characters, a partial workaround to get GitHub's code search to return instances of that substring is to (1) replace the punctuation characters in your search term with spaces, and (2) enclose the search term in quotes.

For example, instead of using the search term:

  • repo:my_repo my_image_asset_1.svg

Try:

  • repo:my_repo "my image asset 1 svg"

This might not be a perfect solution in all cases; I imagine it might also match filenames like my-image-asset-1.svg. But depending on your use case, it might be "good enough"?

Upvotes: 0

silviaegt
silviaegt

Reputation: 335

Adding to @mrgloom's answer, if you're looking for code in a specific programming language in Github using Google you could do something like this in Google's search bar:

  • state the specific string you're looking for using the "intext:" search operator
  • add the programming language you're interested in, using the "ext:" operator (i.e. "ext:py", "ext:R", "ext:rb", etc.)
  • search in all public repos in Github using the "site:" operator mrgloom mentioned.

Example:

intext:"%% 2 == 0" ext:R site:github.com

Google Results from the example

Upvotes: 10

Jan Katins
Jan Katins

Reputation: 2319

If your package is in debian, you can use their code search, which supports regular expressions: https://codesearch.debian.net/

Upvotes: 2

cessationoftime
cessationoftime

Reputation: 906

Today I was trying to look for an exact match of filter class in files named logback.xml in any repo on Github. And I came up with the following query which did the job.

"filter class" in:file filename:logback.xml

To enable exact matches with quotes you need to follow your search with the "in:file" modifier. The matches are not quite exact, the word "class" will have to follow the word "filter", but it seems there can be 0 or more spaces or symbols characters between the two words.

Upvotes: 34

mrgloom
mrgloom

Reputation: 21682

You can use Google directly.

How about this?

"your_string_to_search" site::https://github.com
"your_string_to_search" site::https://gist.github.com

Upvotes: 69

Related Questions