user2548436
user2548436

Reputation: 925

Bash remove substring in file from string

I've one string like this:

myString='value1|value57|value31|value21'

and I've a file, called values_to_remove.txt containing a list of values, one per line, in this way

values_to_remove.txt

value1
value31

In bash, how can I remove the values contained in "values_to_remove.txt" from the string, taking into account that the values are separated by pipe and of course if I remove a value I have to removee also the preceding and the following pipe if any.

I've achieved this in python and called the python script from bash, but I need to do this directly in bash with one line command, rather than small script, otherwise I can already use my little python script.

That's the python code

myString = 'value1|value2|value3|value4'
arrString = myString.split("|")

with open("myfile.txt", encoding="utf-8") as file:
   for l in file:
       if  l in arrString:
           arrString.remove(l)

myNewString = "|".join(arrString)

Note that: the values separeted by pipe can be anything string.

Thank you

Upvotes: 1

Views: 3187

Answers (3)

UrsaDK
UrsaDK

Reputation: 865

A pure bash solution:

#!/usr/bin/env bash

# Define the location of the values-to-be-removed file
: ${PATH_TO_FILE:=${1:-"./values_to_remove.txt"}}

# Define the string we will be working with
: ${MY_STRING:=${2:-"value1|value57|value31|value21"}}

# Process all entries in PATH_TO_FILE, one by one
while read -r substring || [[ -n "$line" ]]; do

  # Remove "substring|" from the beginning of MY_STRING
  MY_STRING=${MY_STRING#${substring}|}

  # Remove "|substring" from the rest of MY_STRING
  MY_STRING=${MY_STRING//|${substring}}

done < "${PATH_TO_FILE}"

# Return the results
echo ${MY_STRING}

Why do we...

  • Use ${VAR_NAME:=${1:-"DEFAULT_VALUE"}} notation - To allow the user to customise script's inputs either via environment variables or script arguments. Basically, this notation says:

    • If VAR_NAME environment variable exists, then use it;
    • If VAR_NAME doesn't exist, then set VAR_NAME to the value of the first argument to the script;
    • If the first argument doesn't exist either, then set VAR_NAME to the DEFAULT_VALUE.
  • Use read -r substring || [[ -n "$line" ]] to read the file? – read allows us to read content of ./values_to_remove.txt file, line by line. The [[ -n "$line" ]] bit is there to catch the last line in the file if it doesn't end with a newline.

References:

Upvotes: 1

kabanus
kabanus

Reputation: 25895

Here is a bash solution (The if statement is a runtime optimization to skip the repacement in case of no match, thanks @Inian):

for val in value1 value31; do
    if [[ "$mystring" =~ \|$val|$val\| ]]; then
        mystring=${mystring/$BASH_REMATCH/}     
    fi
done

This looks in pure bash for the first regular expression that matches either |value or value| and removes it. Note you can match both at the same times because then you will delete too many separators. If there is a chance there are no separators you need to use ? after each pipe (maybe just the second one is enough).

You can also avoid regular expressions and just attempt to delete both a prior and a posterior pipe:

for val in value1 value31; do 
    mystring=${mystring/|$val/};
    mystring=${mystring/$val|/}; 
done

All of these can be written on one line if you really need to:

 for val in value1 value31; do [[ "$mystring" =~ \|$val|$val\| ]]; mystring=${mystring/$BASH_REMATCH/}; done

Upvotes: 1

anubhava
anubhava

Reputation: 785048

You may use this awk:

awk -v str="$myString" 'BEGIN {
   n = split(str, a, /\|/)
}
{
   val[$1]
}
END {
   for (i=1; i<=n; i++)
      if (!(a[i] in val))
         s = (s == "" ? "" : s "|") a[i]
   print s
}' values_to_remove.txt

value57|value21
  • This awk first uses a split function to split input string on |
  • It stores all values to be removed in another array val
  • In the end block it loops through split array and builds a string if value is not found in to-be-removed array.

Upvotes: 3

Related Questions