Claas-Thido Pfaff
Claas-Thido Pfaff

Reputation: 95

Convert JSON to string in PostgreSQL

I want to convert bibtex citations stored as bibjson pulled from a single JSON data field in PostgreSQL (13.5) into proper citation strings.

One example:

SELECT data
  FROM citation LIMIT 1;
 {"id": 1, "note": "DOI 10.1186/s12862-017-1032-x", "type": "article", "title": "Contrasting morphology with molecular data: an approach to revision of species complexes based on the example of European Phoxinus (Cyprinidae)", "author": [{"given": "Anja",
 "family": "Palandačić"}, {"given": "Alexander", "family": "Naseka"}, {"given": "David", "family": "Ramler"}, {"given": "Harald", "family": "Ahnelt"}], "issued": {"date-parts": [[2017]]}, "container-title": "BMC Evolutionary biology, 17(1), 184"}

I want to convert these finally to a proper citations string as latex would. For this I started with the title. I found out that PostgreSQL has a daunting amount of functions for dealing with JSON and that I can use e.g. '->' to get one elements text.

SELECT 
  data->>'title' AS title
  from citation limit 1; 

would give me then

"Contrasting morphology with molecular data: an approach to revision of species complexes based on the example of European Phoxinus (Cyprinidae)"

As a second step I wanted to get all the people and convert them to strings. And this is also where I have problems now to progress and my question arises.

SELECT 
  data->>'author' AS authors
  FROM citation limit 1; 

gives me this...

 [{"given": "R.", "family": "Reinartz"}, {"given": "S.", "family": "Lippold"}, {"given": "D.", "family": "Lieckfeldt"}, {"given": "A.", "family": "Ludwig"}]

but how would I smartly condense this into a proper string like this?

"Reinhartz, R. Lippold S., Lieckfeldt D., Ludwig, A."

Or do it even smarter and get Author XY et al if there is more than one author?

Any help with this would be appreciated.

UPDATE:

From the hints in the comments I tried jsonb_array_elements_text:

SELECT 
  jsonb_array_elements_text(data->'author') AS author 
FROM citation LIMIT 1;

However this gives me e.g. only the first object in the array.

{"given": "R.", "family": "Reinartz"}

Interestingly this only happens when I query form the table. When I copy paste the array manually like so:

select * from json_array_elements('[{"given": "R.", "family": "Reinartz"}, {"given": "S.", "family": "Lippold"}, {"given": "D.", "family": "Lieckfeldt"}, {"given": "A.", "family": "Ludwig"}]')

Then this step works:

                 value
-----------------------------------------
 {"given": "R.", "family": "Reinartz"}
 {"given": "S.", "family": "Lippold"}
 {"given": "D.", "family": "Lieckfeldt"}
 {"given": "A.", "family": "Ludwig"}

OK the limit of 1 restricts to 1 element in the output being shown of course. That was the reason here. So this works:

SELECT 
jsonb_array_elements_text(data->'author') AS author 
FROM citation;

And produces:

                 value
-----------------------------------------
 {"given": "R.", "family": "Reinartz"}
 {"given": "S.", "family": "Lippold"}
 {"given": "D.", "family": "Lieckfeldt"}
 {"given": "A.", "family": "Ludwig"}

In a next step process the inner objects:

select 
  concat(jsonb_array_elements(data->'author')->>'family', ' ',
    jsonb_array_elements(data->'author')->>'given') AS author
  from citation;

Concatenates the first and family name into a single string on each row.

@Bergi posted a solution which works out pasting together the title and the authors. See in his comment or:

SELECT data->>'title' || ' by ' || (SELECT string_agg((a->>'family') || ' ' || (a->>'given'), ', ') FROM jsonb_array_elements(data->'author') a) FROM citations

With a slight modification I now ended up with the following:

SELECT data->>'title' || 
       ' by ' || 
       (SELECT string_agg((a->>'family') || 
       ' ' || 
       (a->>'given'), ', ') FROM jsonb_array_elements(data->'author') a) || 
       ' ' || 
       concat('(', substring(data->'issued'->>'date-parts', '(?<=\[\[)\d{4}'), ')')
       FROM citation; 

Which also includes the date of the publication. Thanks all for you help.

Upvotes: 0

Views: 9072

Answers (1)

Claas-Thido Pfaff
Claas-Thido Pfaff

Reputation: 95

You can use the extensive json support of PostgreSQL for example:

SELECT data->>'title' || 
       ' by ' || 
       (SELECT string_agg((a->>'family') || 
       ' ' || 
       (a->>'given'), ', ') FROM jsonb_array_elements(data->'author') a) || 
       ' ' || 
       concat('(', substring(data->'issued'->>'date-parts', '(?<=\[\[)\d{4}'), ')')
       FROM citation; 

Thanks to all for your support!

Upvotes: 1

Related Questions