Reputation: 2705
I am using Python
to process outputs from hadoop filesystem which contain timestamp and a file name.
Since the output is long, I want to print only the lines that has not been checked.
For this, I am going to store a timestamp last_ts
, which is the last time the system was checked.
To print the whole output, I am using the command
hadoop fs -ls /path/to/donemarkerfiles/ | sort -k 6 |awk '{print $6" "$7" "$8} '
where 6 is the day, 7 is the time, and 8 is the content.
I want to compare 6 and 7 with last_ts
, and print only certain lines.
How can I do this? I tried to use if
condition in awk
, but I stumbled a lot and gave up.
Sample output :
2014-06-23 05:45 /user/hdfs/warehouse/donemarkers/20140621_basic.done
2014-06-23 07:13 /user/hdfs/warehouse/donemarkers/20140621_stat.done
2014-06-23 08:08 /user/hdfs/warehouse/donemarkers/20140621_raw.done
2014-06-23 09:30 /user/hdfs/warehouse/donemarkers/20140621_join.done
2014-06-23 09:31 /user/hdfs/warehouse/donemarkers/20140621_upload_file.done
2014-06-23 15:52 /user/hdfs/warehouse/donemarkers/20140622_basic.done
2014-06-23 20:23 /user/hdfs/warehouse/donemarkers/20140622_stat.done
2014-06-23 21:40 /user/hdfs/warehouse/donemarkers/20140622_raw.done
2014-06-23 22:57 /user/hdfs/warehouse/donemarkers/20140622_join.done
2014-06-23 22:58 /user/hdfs/warehouse/donemarkers/20140622_upload_file.done
Upvotes: 0
Views: 182
Reputation: 8637
This one was ridiculously hard to figure out, I guess mainly because I'm not an awk expert. However, this does the heavy lifting:
cat /tmp/data | awk '{"date -d \"" $1 " " $2 "\" +%s"|getline secs; print secs, $0}'
Use awk to call the standard date
util to format as epoch secs, and assign that return value to an awk variable with getline. In my command I just printed everything out, now let's do the filtering.
cat /tmp/data | awk '{"date -d \"" last_ts "\" +%s"|getline mindate; "date -d \"" $1 " " $2 "\" +%s"|getline secs; if (secs > mindate) print $0}'
Now we've got two things of the form cmd|getline var
in there, which seems unwieldy. I would put that in a script (with a comment) but I'd never type it 'live'. Also, I'm not clear where last_ts
lives. In the awk script already?
Upvotes: 2