Reputation: 28642
I can search exact matches from Google by using quotes like "system <<-"
.
How can I do the same thing for GitHub?
Upvotes: 492
Views: 273389
Reputation: 1
import requests
import re
from datetime import datetime
# Replace with your organization name and GitHub PAT
GITHUB_ORG = "your-org-name"
GITHUB_TOKEN = "your-github-token"
API_URL = f"https://api.github.com/orgs/{GITHUB_ORG}/repos"
# Headers for API authentication
HEADERS = {
"Authorization": f"Bearer {GITHUB_TOKEN}"
}
# Configurable date (ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ)
LAST_COMMIT_DATE = "2024-01-01T00:00:00Z"
def get_public_repos():
"""Fetch all public repositories for the organization."""
repos = []
page = 1
while True:
response = requests.get(f"{API_URL}?per_page=100&page={page}", headers=HEADERS)
if response.status_code != 200:
print(f"Error fetching repos: {response.json()}")
break
data = response.json()
if not data:
break
repos.extend(data)
page += 1
return repos
def filter_repos_with_main_and_date(repos):
"""Filter repositories with a main branch and commits after a certain date."""
filtered_repos = []
for repo in repos:
default_branch = repo.get("default_branch")
if default_branch != "main":
continue
commits_url = repo["commits_url"].replace("{/sha}", "")
response = requests.get(f"{commits_url}?per_page=1", headers=HEADERS)
if response.status_code != 200:
print(f"Error fetching commits for {repo['name']}: {response.json()}")
continue
commits = response.json()
if not commits:
continue
latest_commit_date = commits[0]["commit"]["committer"]["date"]
if latest_commit_date > LAST_COMMIT_DATE:
filtered_repos.append(repo)
return filtered_repos
def get_jenkinsfile_content(repo_name):
"""Fetch the content of Jenkinsfile if it exists in the main branch."""
url = f"https://api.github.com/repos/{GITHUB_ORG}/{repo_name}/contents/Jenkinsfile?ref=main"
response = requests.get(url, headers=HEADERS)
if response.status_code == 200:
content = response.json().get("content")
return content.encode('ascii') if content else None
return None
def find_java_version(jenkinsfile_content):
"""Extract Java version from the Jenkinsfile content."""
try:
content = jenkinsfile_content.decode("base64").decode("utf-8")
# Regex pattern to match Java version declarations
java_version_pattern = r"(?:JAVA_HOME|jdk\s*=|java\s*['\"]?\d+).*?(\d+)"
matches = re.findall(java_version_pattern, content, re.IGNORECASE)
return matches if matches else "No Java version found"
except Exception as e:
print(f"Error decoding Jenkinsfile content: {e}")
return "Error decoding"
def main():
repos = get_public_repos()
print(f"Found {len(repos)} public repositories.")
filtered_repos = filter_repos_with_main_and_date(repos)
print(f"{len(filtered_repos)} repositories have a main branch and commits after {LAST_COMMIT_DATE}.")
for repo in filtered_repos:
repo_name = repo["name"]
print(f"Checking repository: {repo_name}")
jenkinsfile_content = get_jenkinsfile_content(repo_name)
if jenkinsfile_content:
java_versions = find_java_version(jenkinsfile_content)
print(f"Java version(s) in {repo_name}: {java_versions}")
else:
print(f"No Jenkinsfile found in {repo_name}")
if __name__ == "__main__":
main()
Upvotes: 0
Reputation: 1879
You can now do regex searches in GitHub using forward slashes rather than quotes, so you can match both exact strings and patterns. Try the search /system <<-/
for an exact match, or /system[\s]*<<-/
for any number of whitespace characters in the middle, for example!
This was a new option included in the 2023 rollout of GitHub's Code Search feature. See the announcement blog post and the official syntax documentation.
Upvotes: 4
Reputation: 15141
You couldn't (before 2022). The official GitHub searching rules:
Due to the complexity of searching code, there are a few restrictions on how searches are performed:
- Only the default branch is considered. In most cases, this will be the master branch.
- Only files smaller than 384 KB are searchable.
- Only repositories with fewer than 500,000 files are searchable.
- You must always include at least one search term when searching source code. For example, searching for
language:go
is not valid, whileamazing language:go
is.- At most, search results can show two fragments from the same file, but there may be more results within the file.
- You can't use the following wildcard characters as part of your search query:
. , : ; / \ ` ' " = * ! ? # $ & + ^ | ~ < > ( ) { } [ ]
The search will simply ignore these symbols.
Update: GitHub supports literal strings now, but you can also try some more powerful ways below.
For complex search with regex support try Sourcegraph.
git-grep
:git support searching in sources with git-grep command. Just clone a repository and use the command in the folder:
git grep "text-to-search"
Alternatives:
I recommend you to try ripgrep tool, it's fast and simple. Works like git-grep
but looks nicer:
rg "text-to-search"
And you can use the standard grep
to search any text in files:
grep -r "text-to-search" /repository
Upvotes: 263
Reputation: 13700
/
(slashes)Search for occurrence of query = """
, will look like this /query = """/
Search for query = """
in test_*.py
files, will look like this path:**/test_*.py /query = """/
Use GitHub Search or Advanced GitHub Search - be aware that the advanced search
might still generate some unrecognized qualifiers (e.g. filename:
)
Upvotes: 1
Reputation: 1036
As of 11/2/2021, this is possible by putting quotation marks around your search string
While it's now possible to search exact strings, the functionality doesn't yet support searching on non-alphanumeric characters. Example:
Upvotes: 3
Reputation: 3484
UPDATE: As of November 2022, the solution above only works if you are signed in on GitHub.
You can enable preview of new search experience on this link: https://github.com/features/code-search-code-view/signup.
Then do exact match just by using quotes: "system <<-"
Upvotes: 26
Reputation: 1329592
You can: Since Dec. 2021, your search, done from cs.github.com
, can include special characters
Improving GitHub code search
(from Pavel Avgustinov)
Search for an exact string, with support for substring matches and special characters, or use regular expressions (enclosed in
/
separators).
So "system <<-"
should work, on that new search site.
Upvotes: 16
Reputation: 3274
If you quickly want to search inside a specific repo, try that:
.
while viewing the repo to open it inside a browser-based VS Code windowUpvotes: -1
Reputation: 27003
If your search term is a filename or other substring which contains punctuation characters, a partial workaround to get GitHub's code search to return instances of that substring is to (1) replace the punctuation characters in your search term with spaces, and (2) enclose the search term in quotes.
For example, instead of using the search term:
repo:my_repo my_image_asset_1.svg
Try:
repo:my_repo "my image asset 1 svg"
This might not be a perfect solution in all cases; I imagine it might also match filenames like my-image-asset-1.svg
. But depending on your use case, it might be "good enough"?
Upvotes: 0
Reputation: 335
Adding to @mrgloom's answer, if you're looking for code in a specific programming language in Github using Google you could do something like this in Google's search bar:
Example:
intext:"%% 2 == 0" ext:R site:github.com
Upvotes: 10
Reputation: 2319
If your package is in debian, you can use their code search, which supports regular expressions: https://codesearch.debian.net/
Upvotes: 2
Reputation: 906
Today I was trying to look for an exact match of filter class
in files named logback.xml
in any repo on Github. And I came up with the following query which did the job.
"filter class" in:file filename:logback.xml
To enable exact matches with quotes you need to follow your search with the "in:file" modifier. The matches are not quite exact, the word "class" will have to follow the word "filter", but it seems there can be 0 or more spaces or symbols characters between the two words.
Upvotes: 34
Reputation: 21682
You can use Google directly.
How about this?
"your_string_to_search" site::https://github.com
"your_string_to_search" site::https://gist.github.com
Upvotes: 69