Mohan
Mohan

Reputation: 13

need help on shell script for expected output

I have an input file called input.txt like this:

powerOf|creating new file|failure
creatEd|new file creating|failure
powerAp|powerof server|failureof file

I extract the text up to just before the fist capital letter in the first field and store those snippets in output.txt:

power
creat

I used the sed command to separate out the values and it's working fine.

From the output file (output.txt), I need to grep from the first field, and output should be like below:

Power
power:powerOf|creating new file|failure,powerAp|powerof server|failureof file
creat
creat:creatEd|new file creating|failure

I have tried a few ways but I'm not getting the expected output.

I tried the following but I'm getting duplicate entries:

cat input.txt | cut -d '|' f1 >> input1.txt
cat input1.txt | s/\([a-z]\)\([A-Z]\)/\1 \2/g >> output.txt
while read -r line;do
  echo $ line
  cat input.txt |cut -d ‘|’ f1|grep $line >> output1. txt
done< "output.txt"

I have 20000 lines in the input file. I don’t know why I am getting duplicates the output. What am I doing wrong?

Upvotes: 1

Views: 133

Answers (2)

Stefan Becker
Stefan Becker

Reputation: 5962

Bash solution:

#!/bin/bash
keys=()
declare -A map
while read line; do
    key=$(echo ${line} | cut -d \| -f1 | sed -e 's/[[:upper:]].*$//')
    if [[ -z "${map[$key]}" ]]; then
        keys+=(${key})
        map[$key]="${line}"
    else
        map[$key]+=",${line}"
    fi
done

for key in ${keys[*]}; do
    echo "${key}"
    echo "${key}:${map[$key]}"
done

exit 0

Maybe a Perl solution is acceptable for OP too:

#!/usr/bin/perl
use strict;
use warnings;

my @keys;
my %map;
while (<>) {
    chomp;
    my($key) = /^([[:lower:]]+)/;
    if (not exists $map{$key}) {
        push(@keys, $key);
        $map{$key} = [];
    }
    push(@{ $map{$key} }, $_);
}

foreach my $key (@keys) {
    print "$key\n";
    print "$key:", join(",", @{ $map{$key} }), "\n";
}


exit 0;

Test with your given input:

$ perl dummy.pl <dummy.txt
power
power:powerOf|creating new file|failure,powerAp|powerof server|failureof file
creat
creat:creatEd|new file creating|failure

UPDATE after OP has re-stated the original problem. Solution for the first loop that only includes the 2nd column of the input instead of the whole line:

    message=$(echo ${line} | cut -d \| -f2)
    if [[ -z "${map[$key]}" ]]; then
        keys+=(${key})
        map[$key]="${message}"
    else
        map[$key]+=",${message}"
    fi

Test with your given input:

$ perl dummy.pl <dummy.txt
power
power:creating new file,powerof server
creat
creat:new file creating

Upvotes: 2

tripleee
tripleee

Reputation: 189357

Factoring out the useless uses of cat and other antipatterns, you are basically doing

# XXX not a solution, just a refactoring of your code
sed 's/\([a-z]\)\([A-Z]\).*/\1/' input.txt | grep -f - input.txt

which extracts the lines just fine, but does nothing to join them. If you want to merge lines with the same prefix values, a simple Awk script will probably do what you need.

awk '{ key=$1; sub(/[A-Z].*/, "", key)
      b[key] = (key in b ? b[key] "," : key ":" ) $0 }
    END { for(k in b) print b[k] }' input.txt

We extract the prefix into key. If it's a key we have seen before (in which case it exists in the associative array b already), append the previous value and a comma, else initialize the array value to the key itself and a colon before the current line. When we are done, loop through the accumulated key and print the value we have stored for each.

If the lines are long, 20,000 lines might not fit into memory at once, but if your example is representative, should be an unremarkable task on even modest hardware.

Upvotes: 2

Related Questions