John Mulhall
John Mulhall

Reputation: 173

How to wrap an unquoted address in a file with bash?

In my bash script, I have been trying unsuccessfully to get a file address that is not wrapped in double quotes for some reason to be wrapped in double quotes so my script and read the address as one token and input that address into an array element housing addresses. i.e I want

42 Example Lane Bash City Bashland

to become

"42 Example Lane Bash City Bashland"

so I can assign it to ARRAY[4] in my script. Any ideas on how to get double quotes wrapped around the address that do not have double quotes? Its in a .txt file line structure as follows:

FirstName LastName dd/mm/yyyy Address

How do I wrap double quotes around the addresses that does not have them? I tried sed but I seem to be hitting a roadblock trying to get unquoted addresses quoted for processing as a single token in the script.

Upvotes: 2

Views: 173

Answers (4)

SLePort
SLePort

Reputation: 15471

Try this:

$ sed 's/\(.*[0-9]\{2\}\/[0-9]\{2\}\/[0-9]\{4\} \)\([^"]\)\(.*\)\([^"]\)$/\1"\2\3\4"/' <<< "John Doe 04/12/1960 42 Example Lane, Bash City, Bashland"  
John Doe 04/12/1960 "42 Example Lane, Bash City, Bashland"

All characters up to and after the date are captured. Using backreferences, captured groups are output with surrounding ".

To edit a file in place, add the -i flag to sed:

sed 's/\(.*[0-9]\{2\}\/[0-9]\{2\}\/[0-9]\{4\} \)\([^"]\)\(.*\)\([^"]\)$/\1"\2\3\4"/' file.txt

Edit:

Same result and maybe a bit more readable with Extended Regular Expressions:

sed -E 's/(.*[0-9]{2}\/[0-9]{2}\/[0-9]{4} )([^"])(.*)([^"])$/\1"\2\3\4"/' <<< 'John Doe 04/12/1960 42 Example Lane, Bash City, Bashland'

Upvotes: 2

mklement0
mklement0

Reputation: 440471

Unless performance is paramount, Bash's own read builtin offers a convenient solution:

The example uses a here-document in lieu of a text input file; to use a file, substitute the <<'EOF' and all remaining lines with < your-file.txt).

while read -r first last date addr; do
    [[ $addr == \"*\" ]] || addr="\"$addr\""
    echo "first: [$first], last: [$last], date: [$date], addr: [$addr]"
done <<'EOF'
First1 Last1 dd/mm/yyyy Address one unquoted
First2 Last2 dd/mm/yyyy "Address two double-quoted"
EOF

This yields:

first: [First1], last: [Last1], date: [dd/mm/yyyy], addr: ["Address one unquoted"]
first: [First2], last: [Last2], date: [dd/mm/yyyy], addr: ["Address two double-quoted"]

This solution:

  • takes advantage of the fact that read reads the remainder of the line into the last variable specified, if there are fewer variables than fields in the input line.

  • [[ $addr == \"*\" ]] tests if the value read into $addr is already "-enclosed (note the need to \-escape the " chars. so as to treat them as literals) and, if not (||), replaces the value of $addr with itself enclosed in ".


That said, given that double quotes are usually used as syntactic elements that enclose strings for delimiting rather than being part of the strings themselves, you may choose the opposite approach, namely to remove embedded enclosing " chars. from the addresses in the input:

while read -r first last date addr; do
    [[ $addr =~ \"(.*)\" ]] && addr="${BASH_REMATCH[1]}"
    echo "first: [$first], last: [$first], date: [$first], addr: [$addr]"
done <<'EOF'
First1 Last1 dd/mm/yyyy Address one unquoted
First2 Last2 dd/mm/yyyy "Address two double-quoted"
EOF

This yields:

first: [First1], last: [First1], date: [First1], addr: [Address one unquoted]
first: [First2], last: [First2], date: [First2], addr: [Address two double-quoted]

As you can see, the " chars. surrounding the address on the 2nd input line were removed from the value stored in $addr.

This solution:

  • uses =~, Bash's regex-matching operator to match addresses enclosed in literal double quotes (\"(.*)\")

  • and, if so (&&), redefines $addr to the string between the enclosing double quotes, via the value that the parenthesized subexpression (capture group, (.*)), captured (${BASH_REMATCH[1]}).

Upvotes: 1

Gordon Davisson
Gordon Davisson

Reputation: 126078

In bash, you generally don't need (or want) quotes in your data. Quotes go around data, not in it. You should almost always have double-quotes around variable references, but almost never store any sort of quotes as part of the data in variables. But the details will depend on exactly what you're doing. Here's a quick example:

read firstName lastName date address <file.txt
# Note that if there are more space-separated "words" in the line than variables,
# `read` lumps everything into the last variable (i.e. address)
userArray=("$firstName" "$lastName" "$date" "$address")
# Double-quotes keep $address from being split into multiple array entries
echo "${userArray[0]} ${userArray[1]}'s address is:"
# Note that one set of double-quotes is enough to protect the whole string,
# even though there are multiple variable references in it.
echo "  ${userArray[3]}"

If you need quotes in the output, add them when outputting the data:

echo "Address='$address'" # Single-quotes around data
echo "Address=\"$address\"" # Double-quotes must be escaped inside other double-quotes

If you're looping over the file, you'd use something like this:

while read firstName lastName date address; do
    # do stuff with the data
done <file.txt

BTW, putting different kinds of data (name, date, etc) in arrays is a bit weird; usually you use arrays to store a list of values of the same type. But again, it depends on the exact situation.

Upvotes: 1

Walter A
Walter A

Reputation: 20032

FirstName LastName dd/mm/yyyy Address? How about double First or double LastNames? Luckelly ypu are only interested in the part after "/yyyy ". When you are a newbie, start with small steps that you understand:

# Remove first part of string. In `sed` you can use `#`when `/` is part of your string.
echo "Mr John F Someone 11/04/2008 44 street somewhere" | sed 's#.*/.... ##'
# Put string in quotes
echo "Mr John F Someone 11/04/2008 44 street somewhere" | sed 's/.*/"&"/'
# Put string in quotes differently (for later study)
printf '"%s"\n' "$(echo "Mr John F Someone 11/04/2008 44 street somewhere")" 
# Combine two sed commands (after a pipe you can enter a newline)
echo "Mr John F Someone 11/04/2008 44 street somewhere" | 
   sed -e 's#.*/.... ##' -e 's/.*/"&"/'
# Or 
echo "Mr John F Someone 11/04/2008 44 street somewhere" | 
   sed 's#.*/.... ##;s#.*#"&"#'
# Or
echo "Mr John F Someone 11/04/2008 44 street somewhere" | 
   sed -e 's#.*/.... \(.*\)#"\1"#'

Upvotes: 0

Related Questions