Reputation: 67221
I need to search for a pattern in files. For example the content of the file is below:
3555005!K!00630000078!C!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!47001231000000!0!336344324!1!1!POST!USAGE!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
Here I want to search for lines with !D! and the 7th field in the line is less than the system date, then I want to delete the line and save the file.
Is that possible?
Upvotes: 2
Views: 2243
Reputation: 67221
i tested with the sample data in a file
3555005!K!00630000078!C!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090912000000!0!336344324!1!1!POST!vijay!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090912000000!0!336344324!1!1!POST!vijay!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!D!16042296!DUMMY!20090805235959!0!20090917000000!0!336344324!1!1!POST!USAGE!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090919000000!0!336344324!1!1!POST!USAGE!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090914000000!0!336344324!1!1!POST!vijay!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090915000000!0!336344324!1!1!POST!vijay!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090913000000!0!336344324!1!1!POST!vijay!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090912000000!0!336344324!1!1!POST!USAGE!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090912000000!0!336344324!1!1!POST!USAGE!336344324!0!
but it's is deleting all the lines which consists of !D!. I used the following awk script
# *** Simple AWK script to delete lines from log file ***
# Rule: keep all lines except these that have their 2nd
# field equal to "D" and their 7th field more than
# current date time
BEGIN {
FS = "!";
#delimiter
stopDate = "date +%Y%m%d%H%M%S";
# stopDate = 47001231000001; for test purposes
deletedLineCtr = 0; #diagnostics counter, unused at this time
}
{
if ( match($2, "D") && ($7 < stopDate) )
{
deletedLineCtr++;
}
else
print $0
}
Am I doing anything wrong?
Upvotes: 1
Reputation: 360105
Based on mjv's answer, but simplified and using (assuming) the fifth field for the date (broken into two lines for readability):
awk -F! 'BEGIN {stopdate=strftime("%Y%m%d%H%M%S",systime())}
$2 != "D" || $5 >= stopdate {print}' file.log > newfile.log
Upvotes: 2
Reputation: 75125
If you prefer AWK...
awk -f logstrip.awk in.log > out.log
where logstrip.awk looks something like
# *** Simple AWK script to delete lines from log file ***
# Rule: keep all lines except these that have their 2nd
# field equal to "D" and their 7th field more than
# current date time
BEGIN {
FS = "!"; #delimiter
stopDate = systime();
# stopDate = 47001231000001; for test purposes
deletedLineCtr = 0; #diagnostics counter, unused at this time
}
{
if (match($2, "D") && ($7 < stopDate) ) {
deletedLineCtr++;
}
else
print $0
}
should do the trick.
Attention, however, your field #7 contains an odd date format. I think I recognize an recent epoch value (123...) but it is preceded by 4 apparently unrelated digit. These can easily be removed before comparing to StopDate
Upvotes: 2
Reputation: 10403
Something like this should do the trick... you may want to parse the time if this is not how you have the field formatted
perl -ne '/^([^!]+!){6}([^!]+).*/; print if $2 < time && /!D!/;'
Upvotes: 3