Reputation: 11670
I'm trying to extract a list of files defined in my .gitattributes
file in bash.
The .gitattributes
file looks like this
#
# Exclude these files from release archives.
# This will also make them unavailable when using Composer with `--prefer-dist`.
# https://blog.madewithlove.be/post/gitattributes/
#
/.git export-ignore
/.github export-ignore
/bin export-ignore
/wp-content/themes/**/.storybook export-ignore
/wp-content/themes/**/assets export-ignore
/wp-content/themes/**/storybook export-ignore
/wp-content/themes/**/tests export-ignore
/wp-content/themes/**/.editorconfig export-ignore
/wp-content/themes/**/.env.testing export-ignore
/wp-content/themes/**/.eslintignore export-ignore
/wp-content/themes/**/.eslintrc export-ignore
/wp-content/themes/**/.gitignore export-ignore
/wp-content/themes/**/.stylelintrc export-ignore
/wp-content/themes/**/babel.config.js export-ignore
/wp-content/themes/**/composer.json export-ignore
/wp-content/themes/**/composer.lock export-ignore
/wp-content/themes/**/package.json export-ignore
/wp-content/themes/**/package-lock.json export-ignore
/wp-content/themes/**/phpcs.xml.dist export-ignore
/wp-content/themes/**/phpstan.neon export-ignore
/wp-content/themes/**/phpstan.neon.dist export-ignore
/wp-content/themes/**/postcss.config.js export-ignore
/wp-content/themes/**/webpack.config.js export-ignore
/wp-content/themes/**/CODE_OF_CONDUCT.md export-ignore
composer.lock -diff
yarn.lock -diff
package.lock -diff
#
# Auto detect text files and perform LF normalization
# http://davidlaing.com/2012/09/19/customise-your-gitattributes-to-become-a-git-ninja/
#
* text=auto
#
# The above will handle all files NOT found below
#
*.md text
*.php text
*.inc text
My bash script is inside the bin/
folder, and my .gitattributes
is at the root of the project.
sh bin/test.sh path
The script looks like this
#!/bin/bash
#$1 - current_path variable (root)
file_list=()
while read -r line; do
if [[ "$line" =~ (\/wp-content\/themes\/\*\*/) ]]; then
newline=$(echo "$line" | sed 's/ export-ignore//p' | sed 's/\/wp-content\/themes\/\*\*\///p')
file_list+=("$newline")
fi
done <"$1"/.gitattributes
echo "${file_list[@]}"
But this will return me multiple duplicated files (four times). When I run this I get
.storybook
.storybook
.storybook
.storybook assets
assets
assets
assets storybook
storybook
storybook
storybook tests
tests
tests
tests .editorconfig
.editorconfig
.editorconfig
.editorconfig .env.testing
.env.testing
.env.testing
.env.testing .eslintignore
.eslintignore
.eslintignore
.eslintignore .eslintrc
.eslintrc
.eslintrc
.eslintrc .gitignore
.gitignore
.gitignore
.gitignore .stylelintrc
.stylelintrc
.stylelintrc
.stylelintrc babel.config.js
babel.config.js
babel.config.js
babel.config.js composer.json
composer.json
composer.json
composer.json composer.lock
composer.lock
composer.lock
composer.lock package.json
package.json
package.json
package.json package-lock.json
package-lock.json
package-lock.json
package-lock.json phpcs.xml.dist
phpcs.xml.dist
phpcs.xml.dist
phpcs.xml.dist phpstan.neon
phpstan.neon
phpstan.neon
phpstan.neon phpstan.neon.dist
phpstan.neon.dist
phpstan.neon.dist
phpstan.neon.dist postcss.config.js
postcss.config.js
postcss.config.js
postcss.config.js webpack.config.js
webpack.config.js
webpack.config.js
webpack.config.js CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md
Expected output:
.storybook
assets
storybook
tests
.editorconfig
.env.testing
.eslintignore
.eslintrc
.gitignore
.stylelintrc
babel.config.js
composer.json
composer.lock
package.json
package-lock.json
phpcs.xml.dist
phpstan.neon
phpstan.neon.dist
postcss.config.js
webpack.config.js
CODE_OF_CONDUCT.md
What am I doing wrong?
Upvotes: 2
Views: 147
Reputation: 34034
As others will likely point out, there are other (simpler, more efficient) ways to do what the OP is looking to do; the objective of this answer is to address the behavior of the OP's current sed
code.
By default sed
will pass input through to stdout. Consider:
$ line='/wp-content/themes/**/.storybook export-ignore'
$ echo "${line}" | sed 's/ export-ignore//'
/wp-content/themes/**/.storybook
By adding the p
directive to the sed
command you are telling sed
to print the result to stdout. Consider:
$ line='/wp-content/themes/**/.storybook export-ignore'
$ echo "${line}" | sed 's/ export-ignore//p'
/wp-content/themes/**/.storybook
/wp-content/themes/**/.storybook
As you can see we get 2 sets of output ... one set due to the normal behavior of sed
... one set due to the additional p
directive.
If you want to use the p
directive and eliminate the 'duplicate' output you can add the -n
(aka --quiet
/--silent
) flag which disables sed's
default behavior of passing input through to stdout. Consider:
$ line='/wp-content/themes/**/.storybook export-ignore'
$ echo "${line}" | sed -n 's/ export-ignore//p'
/wp-content/themes/**/.storybook
Because you have 2 sed
commands using the p
directive, while not using the -n
flag, you end up with a total of 4 copies of each matching input (the first sed
generating 2 lines of output; the second sed
then doubling the output again).
To remove the 'duplicates' there are a couple options:
p
directive from both sed
commands or ...-n
flag to both sed
commandsUpvotes: 4
Reputation: 784938
This can be done using a simple awk
:
awk -F/ 'index($0, "/wp-content/themes/") == 1 {sub(/ .*/, "", $NF); print $NF}' .gitattributes
.storybook
assets
storybook
tests
.editorconfig
.env.testing
.eslintignore
.eslintrc
.gitignore
.stylelintrc
babel.config.js
composer.json
composer.lock
package.json
package-lock.json
phpcs.xml.dist
phpstan.neon
phpstan.neon.dist
postcss.config.js
webpack.config.js
CODE_OF_CONDUCT.md
awk
Explanation:
-F/
: Use /
as input field separatorindex($0, "/wp-content/themes/") == 1
: Line start with /wp-content/themes/
onlysub(/ .*/, "", $NF)
: Remove anything after space in last fieldprint $NF
: Print last fieldUpvotes: 3
Reputation: 29
The quick fix would be: just pipe the output through sort -u :-)
The root cause is your usage of the modifier 'p' in the sed
regex. This prints out the extra copies. You can just leave it out gnu.org
If you need the results one filename per line, I would make the script
while read -r line; do
if [[ "$line" =~ (\/wp-content\/themes\/\*\*/) ]]; then
echo "$line" | sed 's/ export-ignore//' | sed 's/\/wp-content\/themes\/\*\*\///'
fi
done <"$1"/.gitattributes
or, even better, with awk
< "$1/.gitattributes" awk '
/\/wp-content\/themes\/\*\*\// {
gsub(/\/wp-content\/themes\/\*\*\//,"");
gsub(/ export-ignore/,"");
print $0;
}'
Upvotes: 2