Vadim Kantorov
Vadim Kantorov

Reputation: 1144

Process tab-separated (tsv) file with xargs while preserving quotes

I have a tab-separated-file myfile.tsv of form:

"a"      "b"
"c"      "d"

How can I process it with xargs (i.e. what args I need to pass) to pass these fields as two arguments to an external program while preserving all characters, including double-quotes (as I don't want to mess-up escaping and this is crucial if values in the tsv are actually JSON-formatted) and splitting by tab into external command's arguments?

E.g. I'd like to have cat myfile.tsv | xargs ... python -c 'import sys;print(sys.argv[1:])' to print

['"a"', '"b"']
['"c"', '"d"']

Thanks!

Upvotes: -1

Views: 98

Answers (4)

Filip B.
Filip B.

Reputation: 11

If you're not sure that there are exactly 2 fields per line, you have to read the file line by line and then split each line individually.
To demonstrate it I added a line to the tsv file, including a backslash, as these may get treated as an escape character by some tools like read:

$ cat myfile.tsv
"a"     "b"
"c"     "d"
"e"     "f"     "g\t"

To split on tab, you have to set the delimiter to tab. We can do this with $(printf \\t).

$ cat myfile.tsv | while read -r line ; do echo -n "$line" | xargs -d "$(printf \\t)" python -c 'import sys;print(sys.argv[1:])' ; done
['"a"', '"b"']
['"c"', '"d"']
['"e"', '"f"', '"g\\t"']

However we can also do the line splitting at once by read by setting the IFS variable (input field separator), reading into an array variable and passing it to the next command (python):

$ cat myfile.tsv | while IFS="$(printf \\t)" read -ra line ; do python -c 'import sys;print(sys.argv[1:])' "${line[@]}" ; done
['"a"', '"b"']
['"c"', '"d"']
['"e"', '"f"', '"g\\t"']

Upvotes: 0

Diego Torres Milano
Diego Torres Milano

Reputation: 69388

You should use NULL chars as delimiters for xargs:

tr '\t\n' '\0' < myfile.tsv | xargs -0 -L2 python -c 'import sys;print(sys.argv[1:])' 
['"a"', '"b"']
['"c"', '"d"']

Upvotes: 0

Paolo
Paolo

Reputation: 26220

Bit hacky but this works:

$ cat myfile.tsv | xargs -n2 -I{} bash -c 'sed -e s/^/\"/ -e s/$/\"/ -e s/\ /\"\ \"/ <<<"{}"' | python -c 'import fileinput 
for line in fileinput.input():
  print(line.split())
'
['"a"', '"b"']
['"c"', '"d"']

Upvotes: 0

ticktalk
ticktalk

Reputation: 922

not sure why you want/need xargs , a trivial example to process

cat parseMe.py 
#!/usr/bin/env python
import sys

for row in sys.stdin:
    lst=list(row.strip().split('\t'))
    print(f'{lst}')

cat myfile.tsv | ./parseMe.py 
['"a"', '"b"']
['"c"', '"d"']
['"e"', '"f"']

feel free to ignore if not suitable / I've missed the essence of why you need xargs.

Upvotes: 0

Related Questions