bashy
bashy

Reputation: 13

How to prepare a txt file to csv using bash script?

How do I prepare a given txt file in bash for csv import? The given structure is like this:

Salutation
Name
Surname
Telephone
E-Mail
Street
City
Gender
Employment
Income
*****
Salutation
Name
Surname
Telephone
E-Mail
Street
City
Gender
Employment
Income
*****
Salutation
Name
Surname
E-Mail
Street
City
Gender
Employment
Income
*****

As you can see the second record doesn't have an E-Mail address. Any other variation of a not given value is also possible. The values are given line by line and the records are separated by five stars.

I tried to use awk and grep in order to write the records into a separated file for csv import. How do I put the given multiply lines of a record in one line for csv and how do I keep the order if an item e.g. the telephone number is not given?

Many thanks in advance.

Upvotes: 0

Views: 92

Answers (2)

bashy
bashy

Reputation: 13

To make the problem more clearly:
I do get email requests which contain lots of unneeded stuff.
So I export the *.eml files to /tmp directory.
I collect the needed information like this into one file called Input.txt.
My code looks like this:

#!/bin/bash
touch /tmp/Input.txt
OUTFILE=/tmp/Input.txt
cat /dev/null > "$OUTFILE"
FILES=/tmp/*.eml
for f in $FILES
do
	grep 'Salutation         :' "$f" | sed 's/^.*: //' | perl -ne 'print "S1 $_"' >> "$OUTFILE"
  	grep 'Surname            :' "$f" | sed 's/^.*: //' | perl -ne 'print "S2 $_"' >> "$OUTFILE"
  	grep 'Name               :' "$f" | sed 's/^.*: //' | perl -ne 'print "S3 $_"' >> "$OUTFILE"
  	grep 'Telephone          :' "$f" | sed 's/^.*: //' | perl -ne 'print "S4 $_"' >> "$OUTFILE"
  	grep 'E-Mail             :' "$f" | sed 's/^.*: //' | perl -ne 'print "S5 $_"' >> "$OUTFILE"
  	grep 'Street             :' "$f" | sed 's/^.*: //' | perl -ne 'print "S6 $_"' >> "$OUTFILE"
  	grep 'City               :' "$f" | sed 's/^.*: //' | perl -ne 'print "S7 $_"' >> "$OUTFILE"
  	grep 'Date               :' "$f" | sed 's/^.*: //' | perl -ne 'print "S8 $_"' >> "$OUTFILE"
  	grep 'Size               :' "$f" | sed 's/^.*: //' | perl -ne 'print "S9 $_"' >> "$OUTFILE"
  	grep 'Animals            :' "$f" | sed 's/^.*: //' | perl -ne 'print "S10 $_"' >> "$OUTFILE"
  	grep 'Employment         :' "$f" | sed 's/^.*: //' | perl -ne 'print "S11 $_"' >> "$OUTFILE"
  	grep 'Income             :' "$f" | sed 's/^.*: //' | perl -ne 'print "S12 $_"' >> "$OUTFILE"
  	echo "*****" >> "$OUTFILE"
done

Finally I get the OUTFILE Input.txt like this:

S1 Mr
S2 John
S3 Smith
S4 1514009855
S5 [email protected]
S6 11 Elm Street
S7 Denver
S8 05/21/2016
S9 66
S10 Cat
S11 Officer
S12 20
*****
S1 Mrs
S2 Mary
S3 Wood
S4 65223457
S5 [email protected]
S6 60 Taft Ave.
S7 Boston
S8 04/26/2016
S10 Dog
S11 Secretary
S12 10
*****
S1 Mrs
S2 Lori
S3 White
S4 56325478
S6 730 Vista del Playa
S7 Anaheim
S8 01/22/2016
S10 Fish
S11 Teacher
S12 80
*****


So the first record is complete S1 till S12.
In the second record there is S9 missing and in the third one there is S5 and S9 missing.
The aim is to get these records read out from Input.txt and put them into a csv-file.
The csv should look like this, considering the missing items:
Salutation,Surname,Name,Telephone,E-Mail,Street,City,Date,Size,Animals,Employment,Income
Mr;John;Smith;1514009855;[email protected];11ElmStreet;Denver;05/21/2016;66;Cat;Officer;20
Mrs;Mary;Wood;65223457;[email protected];60TaftAve.;Boston;04/26/2016;;Dog;Secretary;10
Mrs;Lori;White;56325478;;730VistadelPlaya;Anaheim;01/22/2016;;Fish;Teacher;80

Upvotes: 0

user3486184
user3486184

Reputation: 2365

Typically a .csv file has fixed records - and fields that are not included are left empty. So your first example might be:

"Mr","John","Smith","555-1212","jsmith@foo","1 St","New York","M","CSV Wrangler","5"

and your second might be:

"Mrs","Mary","Brown",,"mbrown@foo","5 St","Ottawa","F","CSV Wrangler","5"

There is no way in your input file to detect which field is missing, though. That means you won't be able to reliably create a .csv file. You need to know field names and field values to do that, unless you infer fields based on content ("This contains an @ sign so must be an email address", etc.) Even that will fail if you have a record like:

****
Homer
Springfield
****

Is that first name and surname, or first name and city? You don't have a way to tell.

Upvotes: 1

Related Questions