pavithra
pavithra

Reputation: 83

How to extract key-value pairs from CSV using Talend

I have data for one column in my CSV file as :

          `column1`
    
   row1 : {'name':'Steve Jobs','location':'America','status':'none'}
   row2 : {'name':'Mark','location':'America','status':'present'}
   row3 : {'name':'Elan','location':'Canada','status':'present'}

I want as the output for that column as :

`name`       `location`     `status`   

Steve jobs    America        none  
Mark          America        present
Elan          Canada         present

But sometimes I have row value like {'name':'Steve Jobs','location':'America','status':'none'},{'name':'Mark','location':'America','status':'present'}

Please help !

Upvotes: 1

Views: 485

Answers (1)

Jim Macaulay
Jim Macaulay

Reputation: 5141

You have to use tMap and tExtractDelimitedFields components.

Flow,

enter image description here

Below is the step by step explination,

  1. Original data - row1 : {'name':'Steve Jobs','location':'America','status':'none'}
  2. Substring the value inside the braces using below function row1.Column0.substring(row1.Column0.indexOf("{")+1, row1.Column0.indexOf("}")-1)
    Now the result is - 'name':'Steve Jobs','location':'America','status':'none'

3.Extract single columns to multiple using tExtractDelimitedFields. Since the columns are seperated be ,, delimiter should be provided as comma. And we have 3 fields in the data, so create 3 fields in the component schema. Below is the snipping of the tExtractDelimitedFields component configuration

enter image description here

Now the result is,

     name             location               status
'name':'Steve Jobs'  'location':'America'   'status':'none'
'name':'Mark'        'location':'America'   'status':'present'
'name':'Elan'        'location':'Canada'    'status':'present'
  1. Again using one more tMap, replace the column names and single quotes from the data,

     row2.name.replaceAll("'name':", "").replaceAll("'", "") 
     row2.location.replaceAll("'location':", "").replaceAll("'", "") 
     row2.status.replaceAll("'status':", "").replaceAll("'", "") 
    
    
    
    
  2. Your final result is below,

enter image description here

Upvotes: 1

Related Questions