user13648306
user13648306

Reputation: 39

Iterate over array and output TSV report

I have file with 30, 000 JSON lines delimited by new line. I am using JQ to process it.

Below is each line schema (new.json).

{
  "indexed": {
    "date-parts": [
      [
        2020,
        8,
        13
      ]
    ],
    "date-time": "2020-08-13T06:27:26Z",
    "timestamp": 1597300046660
  },
  "reference-count": 42,
  "publisher": "American Chemical Society (ACS)",
  "issue": "3",
  "content-domain": {
    "domain": [],
    "crossmark-restriction": false
  },
  "short-container-title": [
    "Org. Lett."
  ],
  "published-print": {
    "date-parts": [
      [
        2005,
        2
      ]
    ]
  },
  "DOI": "10.1021/ol047829t",
  "type": "journal-article",
  "created": {
    "date-parts": [
      [
        2005,
        1,
        27
      ]
    ],
    "date-time": "2005-01-27T05:53:29Z",
    "timestamp": 1106805209000
  },
  "page": "383-386",
  "source": "Crossref",
  "is-referenced-by-count": 38,
  "title": [
    "Liquid-Crystalline [60]Fullerene-TTF Dyads"
  ],
  "prefix": "10.1021",
  "volume": "7",
  "author": [
    {
      "given": "Emmanuel",
      "family": "Allard",
      "affiliation": []
    },
    {
      "given": "Frédéric",
      "family": "Oswald",
      "affiliation": []
    },
    {
      "given": "Bertrand",
      "family": "Donnio",
      "affiliation": []
    },
    {
      "given": "Daniel",
      "family": "Guillon",
      "affiliation": []
    }
  ],
  "member": "316",
  "container-title": [
    "Organic Letters"
  ],
  "original-title": [],
  "link": [
    {
      "URL": "https://pubs.acs.org/doi/pdf/10.1021/ol047829t",
      "content-type": "unspecified",
      "content-version": "vor",
      "intended-application": "similarity-checking"
    }
  ],
  "deposited": {
    "date-parts": [
      [
        2020,
        4,
        7
      ]
    ],
    "date-time": "2020-04-07T13:39:55Z",
    "timestamp": 1586266795000
  },
  "score": null,
  "subtitle": [],
  "short-title": [],
  "issued": {
    "date-parts": [
      [
        2005,
        2
      ]
    ]
  },
  "references-count": 42,
  "alternative-id": [
    "10.1021/ol047829t"
  ],
  "URL": "http://dx.doi.org/10.1021/ol047829t",
  "relation": {},
  "ISSN": [
    "1523-7060",
    "1523-7052"
  ],
  "issn-type": [
    {
      "value": "1523-7060",
      "type": "print"
    },
    {
      "value": "1523-7052",
      "type": "electronic"
    }
  ],
  "subject": [
    "Physical and Theoretical Chemistry",
    "Organic Chemistry",
    "Biochemistry"
  ]
}

For every DOI, I need to obtain the values of given and family key in the same cell of the same row of that DOI in the CSV/TSV format.

The expected output for the above json is (in CSV/TSV format):

|DOI| givenName|familyName|
|10.1021/ol047829t|Emmanuel; Frédéric; Bertrand; Daniel;|Allard; Oswald; Donnio; Guillon|

I am using the below command line but it is throwing error and when I try to alter I am unable to get CSV/TSV output at all.
cat new.json | jq -r "[.DOI, .publisher, .author[] | .given] | @tsv" > manage.tsv


The same logic applies for subject key also. I am using the below command line to output values of subject key to CSV but it is throwing only the first element (in this case only: "Physical and Theoretical Chemistry")
cat new.json | jq -c -r "[.DOI, .publisher, .subject[0]] | @csv" > manage.csv

Any pointers for right jq command line will be of great help.

Upvotes: 0

Views: 220

Answers (1)

oguz ismail
oguz ismail

Reputation: 50785

Join given and family names by semicolons separately, then pass resulting strings as fields to the TSV filter.

["DOI", "givenName", "familyName"],
(inputs | [.DOI, (.author | map(.given), map(.family) | join("; "))])
| @tsv

Online demo

Note that you need to invoke JQ with -r and -n flags for this to work and produce a valid TSV output.

Upvotes: 1

Related Questions