rcourtna
rcourtna

Reputation: 4639

Programmatic data conversion strategy

I have a product that imports certain data files from clients (ie: user directories, etc), and will export other types of data (ie: reports, etc). All import and exports are in currently in CSV format (rfc4180), and files are passed back and forth through managed file transfers.

Increasingly, I'm seeing requests from clients to transform and reconfigure these data files for use in their legacy systems. For import data files, it's bizarre requests like:

"We're passing you 20 columns, from that apply $business_logic to columns 4,7,5,18,19 to determine the actual value your system needs in column 21, then drop those original columns cuz they aren't really useful by themselves"

or

"The value in column 2 is padded with zeros, please strip that off."

For data exports files, it's requests like:

"You are sending us .csv, but we need it in our special fixed width format."

or

"You are formatting numbers with decimals. Remove those, and prefix with 8 zeros."

Of course, every client we onboard has different requirements. I'm hesitant to dive in and write something from scratch as I imagine there are all sorts of gotchas along the way in building out files of different formats (csv, tsv, fixed width, excel, stone tablets), and dealing with character encoding, etc, etc. What I'm looking for is some sort of a dev framework (or commercial product) that would allow us to quickly satisfy the increasing number of (and variety of) data transformation requests. Something lightweight & simple is much preferred.

Any thoughts or experiences appreciated.

Upvotes: 3

Views: 161

Answers (1)

ramblingpolak
ramblingpolak

Reputation: 622

I'm not sure if it's a total fit but you can check out streamsets.com

It's an open-source tool for data movement and lightweight transformations. It allows you to provide minimal input schema (e.g. I have CSV files) so you don't have to deal with a lot of the things you mentioned.

*Full disclosure I'm an engineer at StreamSets

Upvotes: 1

Related Questions