Reputation: 13
I have an input file called input.txt
like this:
powerOf|creating new file|failure
creatEd|new file creating|failure
powerAp|powerof server|failureof file
I extract the text up to just before the fist capital letter in the first field and store those snippets in output.txt
:
power
creat
I used the sed
command to separate out the values and it's working fine.
From the output file (output.txt
), I need to grep
from the first field, and output should be like below:
Power
power:powerOf|creating new file|failure,powerAp|powerof server|failureof file
creat
creat:creatEd|new file creating|failure
I have tried a few ways but I'm not getting the expected output.
I tried the following but I'm getting duplicate entries:
cat input.txt | cut -d '|' f1 >> input1.txt
cat input1.txt | s/\([a-z]\)\([A-Z]\)/\1 \2/g >> output.txt
while read -r line;do
echo $ line
cat input.txt |cut -d ‘|’ f1|grep $line >> output1. txt
done< "output.txt"
I have 20000 lines in the input file. I don’t know why I am getting duplicates the output. What am I doing wrong?
Upvotes: 1
Views: 133
Reputation: 5962
Bash solution:
#!/bin/bash
keys=()
declare -A map
while read line; do
key=$(echo ${line} | cut -d \| -f1 | sed -e 's/[[:upper:]].*$//')
if [[ -z "${map[$key]}" ]]; then
keys+=(${key})
map[$key]="${line}"
else
map[$key]+=",${line}"
fi
done
for key in ${keys[*]}; do
echo "${key}"
echo "${key}:${map[$key]}"
done
exit 0
Maybe a Perl solution is acceptable for OP too:
#!/usr/bin/perl
use strict;
use warnings;
my @keys;
my %map;
while (<>) {
chomp;
my($key) = /^([[:lower:]]+)/;
if (not exists $map{$key}) {
push(@keys, $key);
$map{$key} = [];
}
push(@{ $map{$key} }, $_);
}
foreach my $key (@keys) {
print "$key\n";
print "$key:", join(",", @{ $map{$key} }), "\n";
}
exit 0;
Test with your given input:
$ perl dummy.pl <dummy.txt
power
power:powerOf|creating new file|failure,powerAp|powerof server|failureof file
creat
creat:creatEd|new file creating|failure
UPDATE after OP has re-stated the original problem. Solution for the first loop that only includes the 2nd column of the input instead of the whole line:
message=$(echo ${line} | cut -d \| -f2)
if [[ -z "${map[$key]}" ]]; then
keys+=(${key})
map[$key]="${message}"
else
map[$key]+=",${message}"
fi
Test with your given input:
$ perl dummy.pl <dummy.txt
power
power:creating new file,powerof server
creat
creat:new file creating
Upvotes: 2
Reputation: 189357
Factoring out the useless uses of cat
and other antipatterns, you are basically doing
# XXX not a solution, just a refactoring of your code
sed 's/\([a-z]\)\([A-Z]\).*/\1/' input.txt | grep -f - input.txt
which extracts the lines just fine, but does nothing to join them. If you want to merge lines with the same prefix values, a simple Awk script will probably do what you need.
awk '{ key=$1; sub(/[A-Z].*/, "", key)
b[key] = (key in b ? b[key] "," : key ":" ) $0 }
END { for(k in b) print b[k] }' input.txt
We extract the prefix into key
. If it's a key we have seen before (in which case it exists in the associative array b
already), append the previous value and a comma, else initialize the array value to the key itself and a colon before the current line. When we are done, loop through the accumulated key and print the value we have stored for each.
If the lines are long, 20,000 lines might not fit into memory at once, but if your example is representative, should be an unremarkable task on even modest hardware.
Upvotes: 2