Reputation: 456
I have two files: file1 and file2. i want to combine the time between file1 and file2 with common column as 1 and 2. I need to create a separate file which contains the file1 data and file2 mapping time column. But for some entires(see BOLD TEXT) its matching the same line from file2. Whats wrong in my command? is there any way to get the expected output?
Note: there can be more than 2 entries as shown in the bold text. column1 and column2 entries can be same while only time changes.
file 1
9950 | p | 2021-02-27 14:16:21.905
9951 | q | 2021-02-27 14:16:28.845
9954 | r | 2021-02-27 14:17:21.884
9954 | r | 2021-02-27 14:19:46.775
9956 | y | 2021-02-27 14:20:20.830
9957 | z | 2021-02-27 14:20:27.325
file 2
9950 | p | 2021-02-27 14:16:22.086
9951 | q | 2021-02-27 14:16:29.033
9954 | r | 2021-02-27 14:17:22.072
9954 | r | 2021-02-27 14:19:46.956
9956 | y | 2021-02-27 14:20:21.018
9957 | z | 2021-02-27 14:20:27.512
command:
awk -F'|' 'NR==FNR{a[$1,$2]=$3;next} a[$1,$2]{print $0"\t"a[$1,$2]}' file2 file1
Output:
9950 | p | 2021-02-27 14:16:21.905 | 2021-02-27 14:16:22.086
9951 | q | 2021-02-27 14:16:28.845 | 2021-02-27 14:16:29.033
9954 | r | 2021-02-27 14:17:21.884 | 2021-02-27 14:19:46.956
9954 | r | 2021-02-27 14:19:46.775 | 2021-02-27 14:19:46.956
9956 | x | 2021-02-27 14:20:20.830 | 2021-02-27 14:20:21.018
9957 | y | 2021-02-27 14:20:27.325 | 2021-02-27 14:20:27.512
Expected output:
9950 | p | 2021-02-27 14:16:21.905 | 2021-02-27 14:16:22.086
9951 | q | 2021-02-27 14:16:28.845 | 2021-02-27 14:16:29.033
9954 | r | 2021-02-27 14:17:21.884 | 2021-02-27 14:17:22.072
9954 | r | 2021-02-27 14:19:46.775 | 2021-02-27 14:19:46.956
9956 | x | 2021-02-27 14:20:20.830 | 2021-02-27 14:20:21.018
9957 | y | 2021-02-27 14:20:27.325 | 2021-02-27 14:20:27.512
Upvotes: 1
Views: 310
Reputation: 203522
If this isn't all you need then edit your question to provide more truly representative sample input/output including cases that this doesn't work for:
$ paste file1 file2 | awk '{$7=$8=$9=""; $0=$0; $1=$1}1'
9950 | p | 2021-02-27 14:16:21.905 | 2021-02-27 14:16:22.086
9951 | q | 2021-02-27 14:16:28.845 | 2021-02-27 14:16:29.033
9954 | r | 2021-02-27 14:17:21.884 | 2021-02-27 14:17:22.072
9954 | r | 2021-02-27 14:19:46.775 | 2021-02-27 14:19:46.956
9956 | y | 2021-02-27 14:20:20.830 | 2021-02-27 14:20:21.018
9957 | z | 2021-02-27 14:20:27.325 | 2021-02-27 14:20:27.512
In the above $7=$8=$9=""
set those fields to NULL
strings but does not change how many fields are in $0
, then $0=$0
causes awk to resplit $0
thereby removing those as individual fields but not altering the white space around where they existed, then $1=$1
causes awk to reconstruct $0
from its fields replacing every string that matches FS
(i.e. contiguous white space) with the string that matches OFS
(i.e. a single blank char). For example (with | tr ' ' '-'
added to make the spaces more clearly visible):
$ echo 'a b c' | awk '{print NF, $0}' | tr ' ' '-'
3-a-b-c
$ echo 'a b c' | awk '{$2=""; print NF, $0}' | tr ' ' '-'
3-a--c
$ echo 'a b c' | awk '{$2=""; $0=$0; print NF, $0}' | tr ' ' '-'
2-a--c
$ echo 'a b c' | awk '{$2=""; $0=$0; $1=$1; print NF, $0}' | tr ' ' '-'
2-a-c
Upvotes: 3
Reputation: 37404
$ awk '
BEGIN {
FS=" [|] "
OFS=" | "
}
NR==FNR {
a[$1,++c[$1]]=$0 # c[$1] is an instance counter array,
next # unique for each $1
}
{
print a[$1,++d[$1]],$NF # so is d[$1]
}' file1 file2
Output:
9950 | p | 2021-02-27 14:16:21.905 | 2021-02-27 14:16:22.086
9951 | q | 2021-02-27 14:16:28.845 | 2021-02-27 14:16:29.033
9954 | r | 2021-02-27 14:17:21.884 | 2021-02-27 14:17:22.072
9954 | r | 2021-02-27 14:19:46.775 | 2021-02-27 14:19:46.956
...
Updated with request from comments:
$ awk '
BEGIN {
FS=" [|] "
OFS=" | "
}
NR==FNR {
a[$1,++c[$1]]=$0 # c[$1] is an instance counter array,
next # unique for each $1
}
(($1,++d[$1]) in a) { # only output if match in both files
print a[$1,d[$1]],$NF # so is d[$1]
}' file1 file2
Upvotes: 3
Reputation: 246807
Similar to James Brown's answer, but only needing to keep one "count" array, and using $1 plus $2 for the key
gawk -F '|' '
FNR == 1 {delete count}
{key = $1 SUBSEP $2 SUBSEP (++count[$1,$2])}
FILENAME == ARGV[1] {time[key] = $3; next}
key in time {print $0, FS, time[key]}
' file2 file1
Upvotes: 3