Reputation: 1
File 1 (big file):
A B C TIMESTAMP
4 4 4 12344653380
5 5 5 12344653401
6 6 6 12344653411
7 7 7 12344653433
8 8 8 12344653460
9 9 9 12344653466
File 2 (small file)(samples from every 30 sec):
D E F TIMESTAMP
1 1 1 12344653400
2 2 2 12344653430
3 3 3 12344653460
File 3:
merge every line where timestamp is between the interval of TIMESTAMP on file 2
A B C TIMESTAMP D E F
5 5 5 12344653401 1 1 1
6 6 6 12344653411 1 1 1
7 7 7 12344653433 2 2 2
8 8 8 12344653460 3 3 3
I have found lots of answers like these How to merge two files using AWK? but they always focus on a match.
Upvotes: 0
Views: 375
Reputation: 133518
Could you please try following and let me know if this helps you.
awk 'FNR==NR && FNR>1{MIN=MIN>$NF?NF:(MIN?MIN:$NF);MAX=MAX>$NF?MAX:$NF;next} ($NF>=MIN && $NF<=MAX)' file2 file1
You could take above's output into file3.
Explanation of awk
command above:
awk 'FNR==NR && FNR>1{
FNR
and NR
are 2 awk's built-in variables which will let us know the number of lines for any file, so the NR
variable's value will be increasing until the last file being read. As we know, awk
can read multiple files at a time, but unlike NR
, FNR
's value will be reset at the beginning of each new file being read. I give the condition FNR==NR
which means this condition will only be TRUE
when first file is being read. Additionally, I am using FNR>1
to make sure file 2's first line shouldn't be read -- as it has headers in it instead of timings. So, when both of these are TRUE
the next actions should take place.
MIN=MIN>$NF?NF:(MIN?MIN:$NF);
Create a variable named MIN
which has a condition if MIN
's value is greater than $NF
(which is last field in awk
, where $
refers to the field's value and NF
refers to the number of fields). If this condition is TRUE
then the action after wild character ?
will be done, which is changing MIN
's value to $NF
's. If that condition is NOT TRUE
then the action/condition after :
will be executed. So, here again there is one condition which checks if MIN
is NULL
then keeps it's value to $NF
else keep MIN
value as it is.
MAX=MAX>$NF?MAX:$NF;
Creating a variable named MAX
, and add a condition to check if MAX
's value is greater than $NF
's value, if this is TRUE
then action after ?
, which is to keep MAX's value as it is, will be executed. If FALSE
then the action after :
, which is to change it's value to $NF
, will be executed.
next}
Using awk
's built in keyword next
here which means leave all next statements further so awk
's cursor will move to the beginning of the statements.
($NF>=MIN && $NF<=MAX)
Now we need to check a condition (which will be only executed when first file named file2 is being read completely), as described above the next
statement, this will not allow it to execute this condition. Check here if $NF
's value is greater or equal to variable MIN
's value and less or equal to MAX
's value then print the current line, though there is nothing written to print here. awk
works on condition/action, so if a condition is TRUE
, then certain actions should be placed. No action is described here, so the default action of printing the current line will happen.
' file2 file1
Mentioning the Input_files named file2 and file1 here.
Upvotes: 3
Reputation: 42999
Try this:
Upvotes: 0