Reputation: 11720
I am working with some text containing field-separators which are "||||":
substring1||||substring2
The substring might also contain whitespaces. I want to split these strings just according to the delimiter "||||", but I couldn't find a proper way to do that. I tried the following command:
echo "substring1||||substring2" | awk '{split($0,a,"||||"); a[2],a[1]}'
actually, that command works if I have just one "|" as the separator. but my problem is that I have more than one pipe character.
I also tried
a=($(echo "substring1||||substring2" | sed -e "s/||||/\n/g"))
It works fine if the substring don't contain whitespaces. but since the substrings might contain whitespaces, they are splitted also on the spaces, which is not desired.
Any idea?
Upvotes: 2
Views: 1444
Reputation: 85873
With GNU awk
you can describe what a field is using FPAT
instead of describing what the field separator is:
$ echo "substring1||||substring2" | awk '{print $1,$2}' FPAT='[^|]+' OFS='\n'
substring1
substring2
Upvotes: 3
Reputation: 8408
The pattern used by split
in awk
is actually regex, so ||||
might actually be 4 alternation operator instead of 4 literal vertical bars (I'm not sure because under certain conditions, |
can be a literal vertical bar.
To match vertical bars, use \|
or [|]
. So for what you want, you can do this
awk '{ split($0, a, /\|+/); print a[2],a[1]}' file
Note I used /.../
(regex constant) to enclose the pattern instead of quotes (dynamic regex). Some details about the difference in gawk manual.
If you want to write column 1 to one file, and column 2 to another file, you can do it all in awk
(I'm using Birei's way because it is more concise).
awk -F'[|]+' '{c1 = c1 $1 "\n"; c2 = c2 $2 "\n"} END {printf c1 >"file1"; printf c2 >"file2"}' input_file
This appends column 1 entries to c1
separated by newline, column 2 to c2
. Then print both to separate files after processing the input file.
Notes:
awk
by placing the strings side by side.printf
which doesn't append a newline, because we already have an extra newline at the end of c1
and c2
.printf
and its argument is optional.Sidenote: the value of -F
is actually a dynamic regex, so the equivalent of '[|]+'
is '\\|+'
.
Upvotes: 2
Reputation: 36282
Use a regular expression as input field separator, like:
awk -F'[|]{4}' '{ printf "Field 1 -> %s\nField 2 -> %s\n", $1, $2 }' infile
Assuming infile
with content:
sub string 1||||sub string2
It yields:
Field 1 -> sub string 1
Field 2 -> sub string2
EDIT: For older awk
versions that don't accept {n}
syntax use -F'[|][|][|][|]'
or -F'[|]+'
instead, like:
awk -c -F'[|]+' '{ printf "Field 1 -> %s\nField2 -> %s\n", $1, $2 }' infile
Also adding --re-interval
, thanks to blue for his comment:
awk -c --re-interval -F'[|]{4}' '{ printf "Field 1 -> %s\nField2 -> %s\n", $1, $2 }' infile
Upvotes: 9
Reputation: 2497
Try using sed and tr ... see if it helps !!
Input.txt
sub string 1||||sub string 2
or
substring1||||substring2
Code
sed 's/||*/%~%/g' Input.txt| tr "%~%" '\n' | sed '/^$/d'
Note
Use any expression like "%~%" .... Any expression(like which I have used) which does not appear to your text file... and replace that using sed and tr...
Upvotes: -3