Novice
Novice

Reputation: 169

Merging two files based on a common column

I have 2 files one is the ouput of telnet command and another is ip and host file. I want to merge these two files based on common ip.

file1:

25-08-2019_22.00.03 : Port port1 of URL http://ip1:port1/ is [ NOT OPEN ]
25-08-2019_22.00.03 : Port port2 of URL http://ip2:port2/ is [ NOT OPEN ] 

and file 2 is:

http://ip1:port1/cs/personal, CS
http://ip2:port2/cs/Satellite/out/, CS_SAT

and a want an output file like below:

25-08-2019_22.00.03 : Port port1 of URL http://ip1:port1/ is [ NOT OPEN ] : CS
25-08-2019_22.00.03 : Port port2 of URL http://ip2:port2/ is [ NOT OPEN ] : CS_SAT

I am no expert in linux, any help is highly appreciated.

I tried join option like join -o file file2, but it does not give desired output.

I tried awk command like creating key value pair for 1st file and running over 2nd file, but it does not give any ouput, is it ecause of sepatarors or special characters in files ?

awk 'FNR==NR{a[$2]=$1;next}{if(a[$1]==""){a[$1]=0};
    print $1,$2,a[$1]}' file1 file2

Upvotes: 1

Views: 74

Answers (2)

oliv
oliv

Reputation: 13249

Using cut and paste:

paste -d " : " file1 <(cut -s -d ',' -f2 file2)

This joins both file line per line with : in between.

The second file is modified to only have the second part of line based on the comma ,.

Upvotes: 1

KamilCuk
KamilCuk

Reputation: 141020

Well, you have to preprocess the input files in some way. First, extract the common field using some regexes with sed, then join. After that you have transform the output to match your expected.

Comments in code:

# recreate input
cat <<EOF >file1
25-08-2019_22.00.03 : Port port1 of URL http://ip1:port1/ is [ NOT OPEN ]
25-08-2019_22.00.03 : Port port2 of URL http://ip2:port2/ is [ NOT OPEN ]
EOF
cat <<EOF >file2
http://ip1:port1/cs/personal, CS
http://ip2:port2/cs/Satellite/out/, CS_SAT
EOF

# join on the first field
join -t' ' -11 -21 <(
  # extract the part inside `http://<here>/` and put it in front of the line
  # ip1:port1 25-08-2019_22.00.03 : Port port1 of URL http://ip1:port1/ is [ NOT OPEN ]
  <file1 sed -r 's@^(.*http://([^/]*).*)$@\2 \1@' | sort
) <(
  # extract the part inside `http://<here>/` and remove all the we are not interested in
  # ip1:port1 CS
  <file2 sed -r 's@http://([^/]*)/.*, (.*)@\1 \2@' | sort
) |
# ip1:port1 25-08-2019_22.00.03 : Port port1 of URL http://ip1:port1/ is [ NOT OPEN ] CS
# remove the leading ip1:port1
cut -d' ' -f2- |
# replace the trailing ` CS` with ` : CS`
sed 's/[^ ]*$/: &/'

Upvotes: 2

Related Questions