Reputation: 8137
I've always used an interface based git client (smartGit) and thus don't have much experience with the git console.
However, I now face the need to substitute a string in all .txt files from history (so, not erasing the whole file but just substituting a string). I found the following command:
git filter-branch --tree-filter 'git ls-files -z "*.php" |xargs -0 perl -p -i -e "s#(PASSWORD1|PASSWORD2|PASSWORD3)#xXxXxXxXxXx#g"' -- --all
I tried this, and unfortunately noticed that while the password did get changed, all binary files got corrupted. Images, etc. would all be corrupted.
Is there a better way to do this that won't corrupt my binary files?
Thanks.
EDIT:
I got mixed up with something. The actual code that caused binary files to get corrupted was:
$ git filter-branch --tree-filter "find . -type f -exec sed -i -e 's/originalpassword/newpassword/g' {} \;"
The code at the top actually removed all files with my password strangely enough.
Upvotes: 52
Views: 34388
Reputation: 1328602
With Git 2.24 (Q4 2019), git filter-branch
(and BFG) is deprecated.
newren/git-filter-repo
does NOT do what you want (see below the Q4 2024 section).
It had an example that was ALMOST what you wanted in its example section:
cd repo
git filter-repo --path-glob '*.txt' --replace-text expressions.txt
with expressions.txt
:
literal:originalpassword==>newpassword
However, WARNING: As Hasturkun adds in the comments
Using
--path-glob
(or--path
) causesgit filter-branch
to only keep files matching those specifications.
The functionality to only replace text in specific files is available in bfg-ish as-fi
, or thelint-history
script.
Otherwise, it looks like this is only currently possible with a custom commit callback.
Seenewren/git-filter-repo
issue 74
Which makes senses, considering the --replace-text
option is itself a blob callback.
Q1 2024, newren/git-filter-repo
issue 74 proposes (from Daniil):
Solution
git filter-branch --tree-filter "find . -path './src/*' -regextype egrep -regex '.*\.(hpp|cpp)' -exec perl -0777 -pe 's{\n\n\n+}{\n\n}g' -i {} \;" <branch/HEAD/hash..HEAD>
It was replacing "
>1 blank lines
" with single one
Q4 2024, with commit 6157207 (probably for v2.46.0), the new --file-info-callback
feature in git-filter-repo
can help you substitute a string in all .txt
files throughout the Git history without corrupting binary files.
The --file-info-callback
feature allows you to write a Python function that operates on each file change (apart from deletions) in the repository's history.
This function can:
.txt
files).In your case, to substitute the string in all .txt
files:
git filter-repo --file-info-callback '
if not filename.endswith(b".txt"):
# No changes for non-.txt files
return (filename, mode, blob_id)
data = value.data
if blob_id in data:
return (filename, mode, data[blob_id])
contents = value.get_contents_by_identifier(blob_id)
new_contents = contents.replace(b"originalpassword", b"newpassword")
new_blob_id = value.insert_file_with_contents(new_contents)
data[blob_id] = new_blob_id
return (filename, mode, new_blob_id)
'
The data = value.data
line creates a dictionary to cache processed blobs, avoiding redundant processing of the same file (hence, the if blob_id in data
est).
If you need to replace multiple strings, you can chain .replace()
calls or use regular expressions with the re
module.
import re
new_contents = re.sub(b"originalpassword|PASSWORD1|PASSWORD2", b"newpassword", contents)
Upvotes: 24
Reputation: 25314
I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch
specifically designed for rewriting files from Git history.
You should carefully follow these steps here: https://rtyley.github.io/bfg-repo-cleaner/#usage - but the core bit is just this: download the BFG's jar (requires Java 7 or above) and run this command (where my-repo.git
is the folder name of the bare clone of your repo):
$ java -jar bfg.jar --replace-text replacements.txt -fi '*.php' my-repo.git
The replacements.txt
file should contain all the substitutions you want to do, in a format like this (one entry per line - note the comments shouldn't be included):
PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default)
PASSWORD2==>examplePass # replace with 'examplePass' instead
PASSWORD3==> # replace with the empty string
regex:password=\w+==>password= # Replace, using a regex
regex:\r(\n)==>$1 # Replace Windows newlines with Unix newlines
Your entire repository history will be scanned, and .php
files (under 1MB in size) will have the substitutions performed: any matching string (that isn't in your latest commit) will be replaced.
Full disclosure: I'm the author of the BFG Repo-Cleaner.
Upvotes: 124
Reputation: 384234
More info on git-filter-repo
https://stackoverflow.com/a/58252169/895245 gives the basics, here is some more info.
Install
As of git 2.5 at least it is not shipped with mainline git so:https://superuser.com/questions/1563034/how-do-you-install-git-filter-repo/1589985#1589985
python3 -m pip install --user git-filter-repo
Usage tips
Here is the more common approach I tend to use:
git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx') HEAD
where:
Bash process substitution allows us to not create a file for simple replaces. If your shell does not support this feature, you just have to write it to a file instead:
echo 'my_password==>xxxxxxxx' > tmp
git filter-repo --replace-text tmp HEAD
HEAD
makes it affect only the current branch
Modify only a range of commits
How to modify only a range of commits with git filter-repo instead of the entire branch history?
git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx') --refs HEAD~2..HEAD
Replace using the Python API
For more complex replacements, you can use the Python API, see: How to use git filter-repo as a library with the Python module interface?
Upvotes: 7
Reputation: 210755
Since this comes up in Google for git replace text in history
, and since using non-git tools is sometimes more trouble than it's worth, here's a command that will replace multi-line text all the way from ${COMMIT}
onwards to HEAD
.
Warning: This is NOT for beginners. It uses git filter-branch
, so all of its caveats/pitfalls/etc. apply. Make sure you've committed/backed up everything you need to save, so you don't lose data.
With that said, create the alias in Bash as follows:
git config --global alias.filter-branch-replace-text '!main() { set -eu && if [ -n "${BASH_VERSION+x}" ]; then set -o pipefail; fi && local pattern patternq replacement replacementq commit && pattern="$1" && shift && replacement="$1" && shift && commit="$1" && shift && local sed_binary_flags="" && if [ msys = "${OSTYPE-}" ]; then sed_binary_flags="-b"; fi && patternq="$(printf "%s" "${pattern}" | sed ${sed_binary_flags} "s/'\''/'\''\\\\'\'''\''/g")." && patternq="'\''${patternq%.}'\''" && replacementq="$(printf "%s" "${replacement}" | sed ${sed_binary_flags} "s/'\''/'\''\\\\'\'''\''/g")." && replacementq="'\''${replacementq%.}'\''" && git filter-branch --tree-filter "for path in $(printf "%s\n" "$@" | sed ${sed_binary_flags} -e "s/'\''/'\''\\\\'\'''\''/g" -e "s/\(.*\)/'\''\1'\''/" | tr "\n" " ")"'\''; do if [ -f "${path}" ]; then perl -0777 -i -s -p -e "s/\\Q\$q\\E/\$s/sgm" -- -q='\''"${patternq}"'\'' -s='\''"${replacementq}"'\'' -- "${path}"; fi || break; done'\'' "${commit}~1..HEAD" --; } && main'
and you can then invoke it from Bash as follows:
git filter-branch-replace-text \
$')\r\n{' \
$') /* EOL */\r\n{' \
"${COMMIT}" \
src/*.txt
Note that this performs literal text replacement, not regular expression replacement.
If you need regexes, you'll need to remove the \Q
and \E
in the Perl command (which perform escaping) and properly escape the strings as needed for the s/$q/$s/sgm
command yourself.
And if you want to pretty-print the script, you can format it like this:
(f="$(git --no-pager config --get alias.filter-branch-replace-text)" && eval "${f%&&*}" && declare -f "${f%%()*}")
Upvotes: 0
Reputation: 1085
I created a file at /usr/local/git/findsed.sh , with the following contents:
find . -name 'githubDirToSubmodule.sh' -exec sed -i '' -e 's/What I want to remove//g' {} \;
I ran the command:
git filter-branch --tree-filter "sh /usr/local/git/findsed.sh"
Explanation of commands
When you run git filter-branch, this goes through each revision that you ever committed, one by one. --tree-filter runs the findsed.sh script on each committed revision, saves it, then progresses to the next revision.
The find command finds a specific file or set of files and executes (-exec) the sed editor on that file. sed is a command that takes the regex after s/ and replaces it with the string between / and /g (blank in my example). {} is a reference to the files path that was given by the find command. The file path is fed to sed, so that sed knows what to work on. \; just ends the -exec command.
Seperating the shell script and command out into seperate pieces allows for less complication when it comes to quotes '' or "".
Peculiarities
I successfully implemented this on a mac, and apparently sed is a particular (older?) version on macs. This matters, as it sometimes behaves differently. Make sure to do sed -i '' or else it was adding a "-e" to the end of files, thinking that that was what i wanted to name my backup files. -i '' says dont make backup files, just edit the files in place and no backup file needed.
Specifying -name 'filename.sh' helped me avoid another issue that I could not solve. There was another file with .sh and that file ended without a newline character. sed for some reason, would add a newline character to the end, despite the 's/blah/blah/g' not matching anything in that file. So instead of figuring out that issue, I just told the find to ignore all other files.
Additional commands that work
Additionally, I found these commands to work in the findsed.sh file (only one command at a time, not multple, so comment # the others out):
find . -name '.publishNewZenPackFromGithub.sh.swp' -exec rm -f {} \;
find . -name '*' -exec grep -H PassToRemove {} \;
Enjoy!
Upvotes: 6
Reputation: 32260
You can avoid touching undesired files by passing -name "pattern"
to find
.
This works for me:
git filter-branch --tree-filter "find . -name '*.php' -exec sed -i -e \
's/originalpassword/newpassword/g' {} \;"
Upvotes: 43
Reputation: 93890
Could be a shell expansion issue. If filter-branch is losing the quotes around "*.php"
by the time it evaluates the command, it may be expanding to nothing, thus git ls-files -z
listing all files.
You could check the filter-branch source or trying different quoting tricks, but what I'd do is just make a one-line shell script that does your tree-filter and pass that script instead.
Upvotes: 2