user3399613
user3399613

Reputation: 15

Problems with traversing Array in csv file in bash script

So what I'm trying to do in my code is basically read in a spreadsheet that has this format

username,   lastname,   firstname,    x1,      x2,       x3,      x4
user1,       dudette,    mary,         7,       2,                 4
user2,       dude,       john,         6,       2,        4,
user3,       dudest,     rad,
user4,       dudaa,      pad,          3,       3,        5,       9

basically, it has usernames, the names those usernames correspond to, and values for each x. What I want to do is read in this from a csv file and then find all of the blank spaces and fill them in with 5s. My approach to doing this was to read in the whole array and then substitute all null spaces with 0s. This is the code so far...

#!/bin/bash

while IFS=$'\t' read -r -a myarray
do
echo $myarray
done < something.csv

for e in ${myarray[@]
do
echo 'Can you see me #1?'
if [[-z $e]]
echo 'Can you see me #2?'
sed 's//0'
fi
done

The code isn't really changing my csv file at all. EDITED NOTE: the data is all comma separated.

What I've figured out so far:

Okay, the 'Can you see me' and the echo myarray are test code. I wanted to see if the whole csv file was being read in from echo myarray (which according to the output of the code seems to be the case). It doesn't seem, however, that the code is running through the for loop at all...which I can't seem to understand.

Help is much appreciated! :)

Upvotes: 0

Views: 842

Answers (3)

Emmet
Emmet

Reputation: 6421

I'm sure there's a better or more idiomatic solution, but this works:

#!/bin/bash

infile=bashcsv.csv     # Input filename
declare -i i           # Iteration variable
declare -i defval=5    # Default value for missing cells
declare -i n_cells=7   # Total number of cells per line
declare -i i_start=3   # Starting index for numeric cells
declare -a cells       # Array variable for cells

# We'd usually save/restore the old value of IFS, but there's no need here:
IFS=','

# Convenience function to bail/bug out on error:
bail () {
    echo $@ >&2
    exit 1
}

# Strip whitespace and replace empty cells with `$defval`:
sed -s 's/[[:space:]]//g' $infile | while read -a cells; do

    # Skip empty/malformed lines:
    if [ ${#cells[*]} -lt $i_start ]; then
        continue
    fi

    # If there are fewer cells than $n_cells, pad to $n_cells
    # with $defval; if there are more, bail:
    if [ ${#cells[*]} -lt $n_cells ]; then
        for ((i=${#cells[*]}; $i<$n_cells; i++)); do
            cells[$i]=$defval
        done
    elif [ ${#cells[*]} -gt $n_cells ]; then
        bail "Too many cells."
    fi

    # Replace empty cells with default value:
    for ((i=$i_start; $i<$n_cells; i++)); do
        if [ -z "${cells[$i]}" ]; then
            cells[$i]=$defval
        fi
    done

    # Print out whole line, interpolating commas back in:
    echo "${cells[*]}"
done

Here's a gratuitous awk one-liner that gets the job done:

awk -F'[[:space:]]*,[[:space:]]*' 'BEGIN{OFS=","} /,/ {NF=7; for(i=4;i<=7;i++) if($i=="") $i=5; print}' infile.csv

Upvotes: 0

John B
John B

Reputation: 3646

The format of your .csv file is not comma separated, it's left aligned with a non-constant number of whitespace characters separating each field. This makes it difficult to be accurate when trying to find and replace empty columns which are followed by non-empty columns.

Here is a Bash only solution that would be entirely accurate if the fields were comma separated.

#!/bin/bash

n=5
while IFS=, read username lastname firstname x1 x2 x3 x4; do
    ! [[ $x1 ]] && x1=$n
    ! [[ $x2 ]] && x2=$n
    ! [[ $x3 ]] && x3=$n
    ! [[ $x4 ]] && x4=$n
    echo $username,$lastname,$firstname,$x1,$x2,$x3,$x4
done < something.csv > newfile.csv && mv newfile.csv something.csv

Output:

username,lastname,firstname,x1,x2,x3,x4
user1,dudette,mary,7,2,5,4
user2,dude,john,6,2,4,5
user3,dudest,rad,5,5,5,5
user4,dudaa,pad,3,3,5,9

Upvotes: 1

Beel
Beel

Reputation: 1030

I realize you asked for bash, but if you don't mind perl in lieu of bash, perl is a great tool for record-oriented files.

#!/usr/bin/perl 
open (FILE, 'something.csv');   
open (OUTFILE, '>outdata.txt'); 
while(<FILE>) {         
        chomp;          
        ($username,$lastname,$firstname,$x1,$x2,$x3,$x4) = split("\t");
        $x1 = 5 if $x1 eq "";
        $x2 = 5 if $x2 eq "";
        $x3 = 5 if $x3 eq "";
        $x4 = 5 if $x4 eq "";
        print OUTFILE "$username\t$lastname\t$x1\t$x2\t$x3\t$x4\n";
}
close (FILE);
close (OUTFILE);
exit;

This reads your infile, something.csv which is assumed to have tab-separated fields, and writes a new file outdata.txt with the re-written records.

Upvotes: 0

Related Questions