Teja Magapu
Teja Magapu

Reputation: 53

Generating a CSV file in bash when data contains commas

I am having trouble with a script I wrote in bash. I am reading in a .txt file line by line in a while loop. The line is in the form of x:y:z:a:b:c. Each element ('x', 'a', etc.) corresponds to something such as age, height, name, and so on. I have successfully managed to print out the desired fields, but when I try to print the whole "x:y:z:a:b:c" "x:y:z" will print out to a .csv in one cell, and "a:b:c" will go to the next cell, due to the fact that there is a comma in the last name and first name. I know this sounds confusing so I can clarify further if necessary but here is what it looks like:

Cell 1                          Cell 2
age:height:number:last name     first name:language

I want all this information in one cell. I think the problem is due to the fact that there is a comma between the last and first name. Please let me know if you have any tips or pointers.

Edit: Here is what the code looks like:

while read INPUT do
    str=$INPUT
    IFS=: read -r -a ARR <<< "$str"
    NAME=${ARR[0]}
    AGE=${ARR[3]}
    echo $AGE, $NAME >>Filedirectory.filename.csv
done<filedirectory.filename.txt

...where filedirectory.filename.txt contains the following:

INPUT=Smith, John:126 lbs:67in:26:College University

Taking a sample input of str = "Smith, John:126 lbs:67in:26:College University" So, `NAME = Smith, John, AGE = 26. Ideally the output should be 2 cells next to each other populated as:

26 - Smith, John

Instead the output in excel when I open the CSV file ends up being:

12 - Smith - John

Where each '-' symbolizes a column (12 is in one cell, last name in the next, etc.) What I want is for Last Name and First Name to be in one cell, but apparently this is not happening due to the fact that the Name is formatted as "Smith, John" and the comma is messing things up

Upvotes: 1

Views: 1190

Answers (3)

Jake Ireland
Jake Ireland

Reputation: 652

One way I have found to do this is to is to use a CSV-aware tool, such as csvkit, which provides the command csvformat:

while IFS=: read -r name _ age _; do 
    echo -e "${name}\t${age}"; 
done < filedirectory.filename.txt | csvformat -t > outputfile.csv

Upvotes: 0

Walter A
Walter A

Reputation: 20002

You can use an Excel feature:
When the first line is SEP=:, the csv file will be read using this as SEP.
Now it is easy:

sed -r ' s/([^:]*):([^:]*):([^:]*):([^:]*):.*/\4:\1/;
        1s/.*/SEP=:\n&/'    < filedirectory.filename.txt > outputfile;

Upvotes: 0

Charles Duffy
Charles Duffy

Reputation: 295403

A simple implementation might look like:

while IFS=: read -r name _ age _; do
    printf '"%s","%s"\n' "${name//\"/\"\"}" "${age//\"/\"\"}"
done <in.txt >out.csv

Less correctly (see the APPLICATION USAGE and RATIONALE sections of the linked specification describing caveats), you could do this with echo:

while IFS=: read -r name _ age _; do
    echo '"'"${name//\"/\"\"}"'","'"${age//\"/\"\"}"'"'
done <in.txt >out.csv

In both cases, we're surrounding the cell contents with quotes, and using a parameter expansion to double-up any quotes found inside of the strings, which makes those quotes literal in CSV syntax.

Upvotes: 1

Related Questions