Hakim
Hakim

Reputation: 11720

split strings containing piping characters

I am working with some text containing field-separators which are "||||":

substring1||||substring2

The substring might also contain whitespaces. I want to split these strings just according to the delimiter "||||", but I couldn't find a proper way to do that. I tried the following command:

echo "substring1||||substring2" | awk '{split($0,a,"||||"); a[2],a[1]}'

actually, that command works if I have just one "|" as the separator. but my problem is that I have more than one pipe character.

I also tried

a=($(echo "substring1||||substring2" | sed -e "s/||||/\n/g")) 

It works fine if the substring don't contain whitespaces. but since the substrings might contain whitespaces, they are splitted also on the spaces, which is not desired.

Any idea?

Upvotes: 2

Views: 1444

Answers (4)

Chris Seymour
Chris Seymour

Reputation: 85873

With GNU awk you can describe what a field is using FPAT instead of describing what the field separator is:

$ echo "substring1||||substring2" | awk '{print $1,$2}' FPAT='[^|]+' OFS='\n'
substring1
substring2 

Upvotes: 3

doubleDown
doubleDown

Reputation: 8408

The pattern used by split in awk is actually regex, so |||| might actually be 4 alternation operator instead of 4 literal vertical bars (I'm not sure because under certain conditions, | can be a literal vertical bar.

To match vertical bars, use \| or [|]. So for what you want, you can do this

awk '{ split($0, a, /\|+/); print a[2],a[1]}' file

Note I used /.../ (regex constant) to enclose the pattern instead of quotes (dynamic regex). Some details about the difference in gawk manual.


If you want to write column 1 to one file, and column 2 to another file, you can do it all in awk (I'm using Birei's way because it is more concise).

awk -F'[|]+' '{c1 = c1 $1 "\n"; c2 = c2 $2 "\n"} END {printf c1 >"file1"; printf c2 >"file2"}' input_file

This appends column 1 entries to c1 separated by newline, column 2 to c2. Then print both to separate files after processing the input file.

Notes:

  1. Concatenation works in awk by placing the strings side by side.
  2. I used printf which doesn't append a newline, because we already have an extra newline at the end of c1 and c2.
  3. All the horizontal spacing in the awk script except between printf and its argument is optional.

Sidenote: the value of -F is actually a dynamic regex, so the equivalent of '[|]+' is '\\|+'.

Upvotes: 2

Birei
Birei

Reputation: 36282

Use a regular expression as input field separator, like:

awk -F'[|]{4}' '{ printf "Field 1 -> %s\nField 2 -> %s\n", $1, $2 }' infile

Assuming infile with content:

sub string 1||||sub string2

It yields:

Field 1 -> sub string 1
Field 2 -> sub string2

EDIT: For older awk versions that don't accept {n} syntax use -F'[|][|][|][|]' or -F'[|]+' instead, like:

awk -c -F'[|]+' '{ printf "Field 1 -> %s\nField2 -> %s\n", $1, $2 }' infile

Also adding --re-interval, thanks to blue for his comment:

awk -c --re-interval -F'[|]{4}' '{ printf "Field 1 -> %s\nField2 -> %s\n", $1, $2 }' infile 

Upvotes: 9

Debaditya
Debaditya

Reputation: 2497

Try using sed and tr ... see if it helps !!

Input.txt

sub string 1||||sub string 2
            or
 substring1||||substring2

Code

  sed 's/||*/%~%/g' Input.txt| tr "%~%" '\n' | sed '/^$/d'

Note

Use any expression like "%~%" .... Any expression(like which I have used) which does not appear to your text file... and replace that using sed and tr...

Upvotes: -3

Related Questions