Split string with for loop and sed in bash shell

Question

I have following string in a variable:

-rw-r--r-- 0 1068 1001 4870 Dec 6 11:58 1.zip -rw-r--r-- 0 1068 1001 20246 Dec 6 11:59 10.zip

I'm trying to loop over this string with a for loop and get the following result:

Dec 6 11:58 1.zip
Dec 6 11:59 10.zip

Does anyone have the proper sed command to do this?

So let me make my question a little more clear. I do an sftp command with -b file and in there I do an ls -l *.zip. The result of this goes into a file. At first, I used a sed command to clear the first 2 lines since these are irrelevant information for me. I now only have the ls results, but they are on one line. In my example, there were just 2 zip files but there can be a lot more.

 ListOfFiles=$(sed '1,2d' $LstFile) #delete first 2 lines
    for line in $ListOfFiles
    do
        $line=$(echo "${line}" | sed (here i want the command to ony print zip file and date)
    done

Jonathan Leffler · Accepted Answer

Notes on the revised scenario

The question has been modified to include a shell fragment:

ListOfFiles=$(sed '1,2d' $LstFile) #delete first 2 lines
for line in $ListOfFiles
do
    $line=$(echo "${line}" | sed # I want to print only file name and date
done

Saving the results into a variable, as in the first line, is simply the wrong way to deal with it. You can use a simple adaptation of the code in my original answer (below) to achieve your requirement simply — very simply using awk, but it is possible using sed with a simple adaptation of the original code, if you're hung up on using sed.

awk variant

awk 'NR <= 2 { next } { print $6, $7, $8, $9 }' $LstFile

The NR <= 2 { next } portion skips the first two lines; the rest is unchanged, except that the data source is the list file you downloaded.

sed variant

sed -nE -e '1,2d' -e 's/^([^ ]+[ ]+){5}([^ ]+([ ]+[^ ]+){3})$/\2/p' $LstFile

In practice, the 1,2d command is unlikely to be necessary, but it is safer to use it, just in case one of the first two lines has 9 fields. (Yes, I could avoid using the -e option twice — no, I prefer to have separate commands in separate options; it makes it easier to read IMO.)

An answer for the original question

If you treat this as an exercise in string manipulation (disregarding legitimate caveats about trying to parse the output from ls reliably), then you don't need sed. In fact, sed is almost definitely the wrong tool for the job — awk would be a better choice — but the shell alone could be used. For example, assuming the data is in the string $variable, you could use:

set -- $variable
echo  $6  $7  $8  $9
echo $15 $16 $17 $18

This gives you 18 positional parameters and prints the 8 you're interested in. Using awk, you might use:

echo $variable | awk '{ print $6, $7, $8, $9; print $15, $16, $17, $18 }'

Both these automatically split a string at spaces and allow you to reference the split elements with numbers. Using sed, you don't get that automatic splitting, which makes the job extremely cumbersome.

Suppose the variable actually holds two lines, so:

echo "$variable"

reports:

-rw-r--r-- 0 1068 1001  4870 Dec 6 11:58 1.zip
-rw-r--r-- 0 1068 1001 20246 Dec 6 11:59 10.zip

The code above assumed that the contents of $variable was a single line (though it would work unchanged if the variable contained two lines), but the code below assumes that it contains two lines. In fact, the code below would work if $variable contained many lines, whereas the set and awk versions are tied to '18 fields in the input'.

Assuming that the -E option to sed enables extended regular expressions, then you could use:

variable="-rw-r--r-- 0 1068 1001  4870 Dec 6 11:58 1.zip
-rw-r--r-- 0 1068 1001 20246 Dec 6 11:59 10.zip"
echo "$variable" |
sed -nE 's/^([^[:space:]]+[[:space:]]+){5}([^[:space:]]+([[:space:]]+[^[:space:]]+){3})$/\2/p'

That looks for a sequence of not white space characters followed by a sequence of white space characters, repeated 5 times, followed by a sequence of not white space characters and 3 sets of a sequence of white space followed by a sequence of not white space. The grouping parentheses — thus picking out fields 1-5 into \1 (which is ignored), and fields 6-9 into \2 (which is preserved), and then prints the result. If you decide you can assume no tabs etc, you can simplify the sed command to:

echo "$variable" | sed -nE 's/^([^ ]+[ ]+){5}([^ ]+([ ]+[^ ]+){3})$/\2/p'

Both of those produce the output:

Dec 6 11:58 1.zip
Dec 6 11:59 10.zip

Dealing with the single line variant of the input is excruciating — sufficiently so that I'm not going to show it.

Note that with the two-line value in $variable, the awk version could become:

echo "$variable" | awk '{ print $6, $7, $8, $9 }'

This will also handle an arbitrary number of lines.

Note how it is crucial to understand the difference between echo $variable and echo "$variable". The first treats all white space sequences as equivalent to a single blank but the other preserves the internal spacing. And capturing output such as with:

variable=$(ls -l 1*.zip)

preserves the spacing (especially the newline) in the assignment (see Capturing multiple line output into a Bash variable). Thus there's a moderate chance that the sed shown would work for you, but it isn't certain because you didn't answer clarifications sought before this answer was posted.

Split string with for loop and sed in bash shell

Answers (2)

Notes on the revised scenario

An answer for the original question

Related Questions