onTheInternet
onTheInternet

Reputation: 7263

unix formatting output variables

I am writing a program that will take fields of data to create usernames and passwords

Here is how the data is formatted

MWS1990 XXX-XX-XXXX STASNY, MATTHEW W SO-II BISS CPSC BS   INFO TECH   412/882-0581

here is the program

for linePosition in {11..22}
do
  holder=`sed -n "${linePosition}p" $1|awk '{print $1}'`
  holder2=`sed -n "${linePosition}p" $1|awk '{print $12}'`
  holder3=`sed -n "${linePosition}p" $1|awk '{print $7}'`
  echo "UserName"
  echo "$holder"
  echo "password"
  echo "$holder2"
  echo "$holder3"
done

It returns an output like this

UserName
MWS1990
password
412/882-0581
BISS

The thing 2 things that are wrong are

  1. I would like it to remove the year after username. So the above example would instead be just MWS. What can I add to holder=`sed -n "${linePosition}p" $1|awk '{print $1}' to make it return just the first 3 letters. (preferably in lower case but not necessary)

  2. I would like to remove the first 6 letters of the phone number. So instead of 412/882-0581 the phone number would read 0581

Upvotes: 1

Views: 192

Answers (3)

Rob Kielty
Rob Kielty

Reputation: 8172

So here is a revised answer

for linePosition in {11..22}
do
  holder=`sed -n "${linePosition}p" $1|awk '{print $1}'`
  holder2=`sed -n "${linePosition}p" $1|awk '{print $12}'`
  holder3=`sed -n "${linePosition}p" $1|awk '{print $7}'`
  echo "UserName"
  echo `expr match "$holder" '\([A-Z|a-z]*\)'`
  echo "password"
  echo ${holder2: -4}
  echo "$holder3"
done

Now I am sticking with the bash string substitution as described in the link I posted in the comment.

However I would like to point out the following caveat about this solution

Here's a quick description of the following line of bash scripting ...

`expr match "$holder" '\([A-Z|a-z]*\)'`

The backticks execute a subshell within your for loop and they run the expr command passing in match which returns that part of the string $holder which matches the regular expression [A-Z|a-z]* at the start of the string. Ref http://tldp.org/LDP/abs/html/string-manipulation.html

Now if your data file is not too long then this will be OK.

However, if your script has to process a large data file then I would suggest that you look at Olaf's solution.

Why?

If you are processing a massive file or if you do not know the size of the file that is to be processed by your script that it is best to avoid executing sub-shells within for loops.

Olaf's solution where he exploits awk to carry out the processing that you require has a important advantage in that all the work takes place within a single process. Whereas the for loop that forks and execs a new instance of bash for each line of your file. An expensive operation which can be risky one when placed in a for loop.

For your code we can see that currently the for loop is bound by a small set of lines but if this is ever changed or a bug was introduced into the for loop whereby it ran forever then the script could adversely affect the performance of your machine.

So although my answer may have been easier to adapt to your code. Olaf's answer is better if you have to process a large amount of data.

Upvotes: 2

Olaf Dietsche
Olaf Dietsche

Reputation: 74108

Since you already use awk, you can reduce the involved commands

awk 'NR >= 11 && NR <= 22 {
    print "UserName";
    print tolower(substr($1, 1, 3));
    print "password";
    print substr($12, 9);
    print $7;}' $1

Upvotes: 3

Eero Helenius
Eero Helenius

Reputation: 2585

If you're using Bash, you can do both of those things easily with Bash substring extraction (see also here).

In other words, something like:

echo ${holder2:0:3} # "MWS"
echo ${holder3:8:12} # "0581"

# Or, to begin indexing from the right end:

echo ${holder3:(-4)} # "0581"

As for converting a string to lowercase in Bash, see e.g. ghostdog74's answer here.

Upvotes: 2

Related Questions