Reputation: 61
I have a csv that contains plain text entries in cells as well as JSON arrays. I'm new to OpenRefine and GREL and having troubles finding an expression to use to clean this csv. I want to get just the value of the "name" key in the JSON arrays.
Example cells:
[{'name': 'Pixar Animation Studios', 'id': 3}]
[{'name': 'TriStar Pictures', 'id': 559}, {'name': 'Teitler Film', 'id': 2550}, {'name': 'Interscope Communications', 'id': 10201}]
[{'name': 'Twentieth Century Fox Film Corporation', 'id': 306}]
[{'iso_3166_1': 'US', 'name': 'United States of America'}]
Expected return values:
Pixar Animation Studios
TriStar Pictures, Teitler Film, Interscope Communications
Twentieth Century Fox Film Corporation
United States of America
Upvotes: 1
Views: 496
Reputation: 2830
If your data looks like this...
...Tom Morris' formula won't work. It seems that Open refine doesn't like single quotes in a Json. Also, since you sometimes have several "names", you'll have to retrieve them all with a forEach()
loop.
The formula is as follow:
forEach(value.replace("'", '"').parseJson(), v, v.name).join(',')
It means: replace the ' by ", parse the json and then, for each element in the array, put it in a variable v and get its value "name". Finally, join the resulting array with a comma.
Final result:
Upvotes: 1
Reputation: 10540
First you need to parse the string into a JSON object and then you'll be able to access the values using normal key-based dictionary access.
value.parseJson()['name']
You can use this expression to add a new column or to operate on the existing column using Transform.
Upvotes: 0