anish anil
anish anil

Reputation: 2641

sed/Awk/cut... How to decide which to use to parse Docker output?

My output:

docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
jenkins/jenkins     lts                 806f56c84444        8 days ago          703MB
mongo               latest              0da05d84b1fe        2 weeks ago         394MB

I would like to just cut the image ID alone from the output.

I tried using cut:

docker images | cut -d " " -f1
REPOSITORY
jenkins/jenkins

The -f1 just gives me the repository names, if I use -f3 it tends to be empty. Since the delimiter is not a single space I don't see how to get the desired output.

Can we cut based on field names?

I read the documentation and did not see anything relevant. I also saw that there is a way to achieve this using sed/AWK which i'm still figuring out.

In the meanwhile is there a easier way to achieve this using the cut command?

I'm new to Unix/Linux, how can I determine which of Sed/AWK/Cut to prefer?

Upvotes: 3

Views: 3241

Answers (6)

James Brown
James Brown

Reputation: 37464

Can we cut based on field names? No.

How can I determine which of Sed/AWK/Cut to prefer? YMMV. For this particular input where fields are separated by two or more spaces, using awk you could set field separator to " +" (two or more spaces), look for desired field name (IMAGE ID below) and print only that particular field:

$ awk -F"  +" '                     # set field separator
{
    if(f=="")                       # while we have not determined the desired field
        for(i=1;i<=NF;i++)          # ... keep looking
            if($i=="IMAGE ID")
                f=i
    if(f!="")                       # once found
        print $f                    # start printing it
}' file

Output:

IMAGE ID
806f56c84444
0da05d84b1fe

As one-liner:

$ awk -F"  +" '{if(f=="")for(i=1;i<=NF;i++)if($i=="IMAGE ID")f=i;if(f!="")print $f}' file

Upvotes: 0

tripleee
tripleee

Reputation: 189749

In the general case, avoid parsing output meant for human consumption. Many modern utilities offer an option to produce output in some standard format like JSON or XML, or even CSV (though that is less strictly specified, and exists in multiple "dialects").

docker in particular has a generalized --format option which allows you to specify your own output format:

docker images --format "{{.ID}}"

If you cannot avoid writing your own parser (are you really sure!? Look again!), cut is suitable for output with a specific single-character delimiter, or otherwise fairly regular output. For everything else, I would go with Awk. Out of the box, it parses columns from sequences of whitespace, so it does precisely what you specifically ask for:

docker images | awk 'NR>1 { print $3 }'

(NR>1 skips the first line, which contains the column headers.)

In the case of fixed-width columns, it allows you to pull out a string by index:

docker images | awk 'NR>1 { print substr($0, 41, 12) }'

... though you could do that with cut, too:

docker images | cut -c41-53

... but notice that Docker might adjust column widths depending on your screen size!

Awk lets you write regular expression extractions, too:

awk 'NR>1 { sub(/^([^[:space:]]*[[:space:]]+){2}/, ""); sub(/[[:space]].*/, ""); print }'

This is where it overlaps with sed:

sed -n '2,$s/^[^ ]\+[ ]\+[^ ]\+[ ]\+\([^ ]\+\)[ ].*/\1/p'

though sed is significantly less human-readable, especially for nontrivial scripts. (This is still pretty trivial.)

If you haven't used regex before, the above will seem cryptic, but it really isn't very hard to pick apart. We are looking for sequences of non-spaces (a field in a column) followed by sequences of spaces (a column separator) - two before the ID field and whatever comes after it, starting from the first space after the ID column.

If you want to learn shell scripting, you should probably also learn at least the basics of Awk (and a passing familiarity with sed). If you just want to get the job done, and perhaps aren't specifically interested in learning U*x tools (though you probably should be anyway!), perhaps instead learn a modern scripting language like Python or Ruby.

... Here's a Python docker library:

import docker
client = docker.from_env()
for image in client.images.list():
    print(image.id)

Upvotes: 0

I3ck
I3ck

Reputation: 433

With Procedural Text Edit it's :

forEach line {
    if (contains ci "REPOSITORY") { remove }
    keepRange word 2 1
}
removeEmptyLines // <- optional

Upvotes: 0

Darby_Crash
Darby_Crash

Reputation: 446

Try this:

docker images | tr -s ' ' | cut -f3 -d' '

The command tr -s ' ' convert multiple spaces into a single one and after with cut you can grab your field. This work fine if values in your field haven't spaces.

Upvotes: 1

TenG
TenG

Reputation: 4004

You have to "squeeze" the space padding in the default output to single space.

1 2 == 1-space-space-2 == Field 1 before 1st space, Field between 1st and 2nd space, Field 3 after 2nd space.

cut -d' ' -f1 ==> '1'

cut -d' ' -f2 ==> '' empty field between 1st and 2nd delimiter

cut -d' ' -f3 ==> '2'

So, in your case use sed to replace consecutive spaces with 1:

docker images | sed 's/ */ /g' | cut -d " " -f1,3

If the output is fixed columns widths, then you can use this variant of cut:

docker images | cut -c1-20,41-60

This will cut out columns 41 to 60, where we find the Image ID.

If ever the output uses TAB for padding, you should use expand -t n to make the output consistently space padded then apply the appropriate cut -cx,y, e.g. (numbers may need adjusting):

docker images | expand -t 4 | cut -c1-20,41-60

Upvotes: 2

oguz ismail
oguz ismail

Reputation: 50795

Your input seems to have a fixed width of 20 chars for each field, so you can make use of gawk's FIELDWIDTHS feature.

$ awk -v FIELDWIDTHS="20 20 20 20 20" '{ print $3 }' file
IMAGE ID
806f56c84444
0da05d84b1fe
$
$ awk -v FIELDWIDTHS="20 20 20 20 20" '{ printf "%20s%20s\n", $1, $3 }' file
REPOSITORY          IMAGE ID
jenkins/jenkins     806f56c84444
mongo               0da05d84b1fe

From man gawk:

If the FIELDWIDTHS variable is set to a space-separated list of numbers, each field is expected to have fixed width, and gawk splits up the record using the specified widths. Each field width may optionally be preceded by a colon-separated value specifying the number of characters to skip before the field starts. The value of FS is ignored. Assigning a new value to FS or FPAT overrides the use of FIELDWIDTHS.

Upvotes: 2

Related Questions