Reputation: 3028

Printing column from a string in bash

UPDATED QUESTION Ok, so I have a file with lines like this:

44:)   2.884E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  9.990E+02
45:)   2.884E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  9.990E+02
1:)   3.593E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  1.000E+05
2:)   3.593E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  1.000E+05

The numbers in the first column run from 1 to x (in this case 45) and then starts over at 1 lots of times. I want to move some of the columns to a separate file. The indexes of the columns I want to move is stored in the variable/array $selected_columns (in this case 2, 5 and 8) and the number of columns I want to move is stored in $number_of_columns (in this case 3).

I then want to create 45 files, one for the selected columns for all 1:), one for the selected columns for all 2:) and so forth. I want to make this as general as possible since both the number of columns and the number running from 1 to x will change. The number x is always known and the columns to extract are chosen by the user.

ORIGINAL QUESTION:

I have a string fetched by egrep. Then I want to print some of the columns (words) in that string. The position (column index) is known in a list in my bash script. Currently it looks like this:

line=$(egrep " ${i}:\)" $1)

for ((j=1; j<=$number_of_columns; j++))
do
    awk $line -v current_column=${selected_columns[$j]} '{printf $(current_column)}' > "history_files/history${i}"
done

where number_of_columns is the number of columns that are to be printed and selected_columns contain the corresponding indexes of those columns. As an example number_of_columns = 3 and selected_columns = [2 5 8], so I want to print word number 2, 5 and 8 from the string line to the file history${i}.

I am not sure what is wrong, but this has been done with some trial and error. The current error is awk: cannot open 0.000E+00 (No such file or directory).

Any help is appreciated!

Upvotes: 1

Answers (3)

Ed Morton

Reputation: 204638

In:

awk $line -v ...

$line holds the output of a grep, probably not something awk expects to see on it's command line. Also,m this:

for ((j=1; j<=$number_of_columns; j++))
do
    anything > "history_files/history${i}"
done

will cause you to overwrite the history file every time through the loop. I don't know what you really wanted there.

You have a slew of other issues with your script, though. You said "As an example number_of_columns = 3 and selected_columns = [2 5 8], so I want to print word number 2, 5 and 8 from the string line to the file history${i}.".

That's trivial entirely in awk and you don't need to do a "grep" outside of awk either, so you could just do the whole thing as:

awk -v pat=" ${i}:\)" -v selected_columns="$selected_columns" '

BEGIN { number_of_columns = split(selected_columns,selected_columnsA) }

$0 ~ pat {
    sep=""
    for (j=1;j<=number_of_columns;j++) {
        current_column = selected_columnsA[j]
        printf "%s,%s",sep,lineA[current_column]
        sep = "\t"
    }
    print ""
}
' "$1" > "history_files/history${i}"

If that doesn't work for you, let's fix THAT instead of trying to fix the original script. Sounds like you have enclosing loop outside of the above, chances are that could just be part of the awk script as well.

EDIT based on updated OP:

I've added lots of comments but let me know if you have questions:

$ cat file
44:)   2.884E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  9.990E+02
45:)   2.884E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  9.990E+02
1:)   3.593E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  1.000E+05
2:)   3.593E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  1.000E+05
$
$ cat tst.sh
selected_columns=(2 5 8)

selCols="${selected_columns[@]}"

awk -v selCols="$selCols" '

BEGIN { # Executed before the first line of the input file is read

    # Split the string of selected column numbers, selCols, into
    # an array selColsA where selColsA[1] has the value of the
    # first space-separated sub-string of selCols (i.e. the number
    # of the first column to print). Note that we dont need the
    # number of columns passed into the script as a result of
    # splitting the string is the count of elements put into the
    # array as a return code from the split() builtin function.
    numCols = split(selCols,selColsA)
}

{ # Executed once for every line of the input file

    # Create a numerix suffix like "45" from the first column
    # in the current line of the input file, e.g. "45:)" by
    # just getting rid of all non-digit characters.
    sfx = $1
    gsub(/[^[:digit:]]/,"",sfx)

    # Create the name of the output file by attaching that
    # numeric suffix to the base value for all output files.
    #histfile = "history_files/history" sfx
    histfile = "tmp" sfx


    # Loop through every column we want printed. selColsA[<index>]
    # gives us a column number which we can then use to access the
    # columns of the current line. Awk uses the builtin variable $0
    # to hold the current line, and it autolatically splits it so
    # that $1 holds the first column, $2 is the second, etc. So
    # if selColsA[1] has the value 3, then $(selColsA[1]) would be
    # the value of the 3rd column of the current input line.
    sep=""
    for (i=1;i<=numCols;i++) {
        curCol = selColsA[i]

        # Print the current column, prefixed by a tab for all but
        # the first column, and without a terminating newline so the
        # next column gets appended to the end of the current output line.
        # Note that in awk "> file" has different semantics from shell
        # and opens the file for writing the first time the line is hit
        # like "> file" in shell, but then appends to it every time its
        # hit afterwards, like ">> file" in shell.
        printf "%s%s",sep,$curCol > histfile
        sep = "\t"
    }
    # Add a newline to the end of the current output line
    print "" > histfile
}

' "$1"
$
$ ./tst.sh file
$
$ cat tmp1
3.593E-02       2.780E+02       1.000E+05
$ cat tmp2
3.593E-02       2.780E+02       1.000E+05
$ cat tmp44
2.884E-02       2.780E+02       9.990E+02
$ cat tmp45
2.884E-02       2.780E+02       9.990E+02

By the way, I used the words "column" and "line" above for your benefit since you're just learning, but FYI the awk terminology is actually "field" and "record".

Upvotes: 1

Olaf Dietsche

Reputation: 74108

I guess, you must change the awk line to:

echo $line | awk -v current_column=${selected_columns[$j]} ...

For your updated question, if the columns are in an array $selected_columns. In your example file, the columns are separated by multiple adjacent spaces. If this is not true for your original file you can omit the sed before grep.

columns=`echo ${selected_columns[*]} | sed 's/ /,/g'`
for i in `seq 45`; do
    sed -e 's/  */ /g' file | grep "^$i:)" | cut -d' ' -f $columns >file-$i
done

Upvotes: 3

djjolicoeur

Reputation: 484

I think you can use cut to do what you are trying to do, ie

echo "$line" | cut -d" " -f2 -f5 -f8 > "history_files/history${i}"

-d is your delimiter, I used spaces to test, hence the " "

hope this helps

Upvotes: 0

Printing column from a string in bash

Answers (3)

Related Questions