Reputation: 478
I need again your expertise, I am trying to do some conditional using awk to get the columns.
If I look at the $5
the data can have year
and in some places a date
.
So when year
is there it's good to print, but other values where I have date and time
like 05:17:27
then I need to print the last field.
2021
2021
05:17:27
20:33:17
05:17:20
2020
2020
2021
2020
2021
Below is my sample data.
yogutdb01 Mon 28 Jun 2021 11:19:56 PM MST
yogutdb02 Thu 30 Sep 2021 02:02:53 AM MST
yogutdb03 Thu Jul 13 05:17:27 2017
yogutdb04 Fri Jun 23 20:33:17 2017
yogutdb05 Thu Jul 13 05:17:20 2017
yogutdb06 Wed 24 Jun 2020 03:49:16 PM MST
yogutdb07 Wed 24 Jun 2020 04:05:10 PM MST
yogutdb08 Sat 22 May 2021 04:19:14 AM MST
yogutdb09 Thu 09 Apr 2020 12:16:32 PM CEST
yogutdb10 Tue 11 May 2021 03:03:02 PM MST
else
condition.$ awk '{ ($5=="[^0-9]+$")print $1,$2,$3,$4,$5; else print $1,$2,$3,$4,$NF}' my_data.text
yogutdb01 2021
yogutdb02 2021
yogutdb03 2017
yogutdb04 2017
yogutdb05 2017
yogutdb06 2020
yogutdb07 2020
yogutdb08 2021
yogutdb09 2020
yogutdb10 2021
OR
yogutdb01 Mon 28 Jun 2021
yogutdb02 Thu 30 Sep 2021
yogutdb03 Thu Jul 13 2017
yogutdb04 Fri Jun 23 2017
yogutdb05 Thu Jul 13 2017
yogutdb06 Wed 24 Jun 2020
yogutdb07 Wed 24 Jun 2020
yogutdb08 Sat 22 May 2021
yogutdb09 Thu 09 Apr 2020
yogutdb10 Tue 11 May 2021
Upvotes: 1
Views: 90
Reputation: 163577
You could print the first 4 fields, and check the 5th field for only 4 digits. If there are not only 4 digits, print the last field.
awk '{print $1, $2, $3, $4, ($5 ~ /^[0-9]+$/ ? $5 : $NF)}' my_data.text
Output
yogutdb01 Mon 28 Jun 2021
yogutdb02 Thu 30 Sep 2021
yogutdb03 Thu Jul 13 2017
yogutdb04 Fri Jun 23 2017
yogutdb05 Thu Jul 13 2017
yogutdb06 Wed 24 Jun 2020
yogutdb07 Wed 24 Jun 2020
yogutdb08 Sat 22 May 2021
yogutdb09 Thu 09 Apr 2020
yogutdb10 Tue 11 May 2021
Upvotes: 3
Reputation: 2915
UPDATE : new version that also fixes month-date cross-placements in columns 3 and 4 :
echo "${aaaaa}" \
\
| mawk 'NF=_+!($_=$(!+$NF?_:NF))*($3=$(2+2^(\
__= $4 ~ /^[0-3][0-9]$/)) \
substr("",$4=$(4-__)))' \_=5
yogutdb01 Mon 28 Jun 2021
yogutdb02 Thu 30 Sep 2021
yogutdb03 Thu 13 Jul 2017 *** fixed these 3 rows
yogutdb04 Fri 23 Jun 2017 ***
yogutdb05 Thu 13 Jul 2017 ***
yogutdb06 Wed 24 Jun 2020
yogutdb07 Wed 24 Jun 2020
yogutdb08 Sat 22 May 2021
yogutdb09 Thu 09 Apr 2020
yogutdb10 Tue 11 May 2021
first one acts upon the assumption that there aren't any numerical data at $NF
other than 4-digit year
2nd option performs a more thorough year-data check. Both involve assigning the proper year value into $5
, then using assignment into NF
to trim out all the excess columns/fields to the right of it.
< datafile.txt \
\
| mawk 'NF=_^($_=$(!+$NF?_:NF))^!_' \_=5
or
| mawk 'NF= +_+($_=$(/[ ][012][0-9][0-9][0-9]$/? NF :_))*!_' \_=5
| gawk 'NF= _+!($_=$(/[ ][0-2][0-9]{3}$/ ? NF :_))' \_=5
yogutdb01 Mon 28 Jun 2021
yogutdb02 Thu 30 Sep 2021
yogutdb03 Thu Jul 13 2017
yogutdb04 Fri Jun 23 2017
yogutdb05 Thu Jul 13 2017
yogutdb06 Wed 24 Jun 2020
yogutdb07 Wed 24 Jun 2020
yogutdb08 Sat 22 May 2021
yogutdb09 Thu 09 Apr 2020
yogutdb10 Tue 11 May 2021
Upvotes: 2
Reputation: 8826
As per your desired outcome, you should try below which will work.
You can use Regular expression matches like ~
.
$ awk '{ if ($5 !~ /:/) { print $1,$2,$3,$4,$5; next } { print $1,$2,$3,$4, $NF } }' exampl_data1
yogutdb01 Mon 28 Jun 2021
yogutdb02 Thu 30 Sep 2021
yogutdb03 Thu Jul 13 2017
yogutdb04 Fri Jun 23 2017
yogutdb05 Thu Jul 13 2017
yogutdb06 Wed 24 Jun 2020
yogutdb07 Wed 24 Jun 2020
yogutdb08 Sat 22 May 2021
yogutdb09 Thu 09 Apr 2020
yogutdb10 Tue 11 May 2021
Just to mention, as @tshiono also asked in the comment,to get the output in order, you can use below.
$ awk '{ if ($5 !~ /:/) { print $1, $2, $3, $4, $5; next } { print $1, $2, $4, $3, $NF } }' exampl_data1
Upvotes: 3
Reputation: 22062
==
operator to test the regex match. Instead you can use
match()
function or ~
operator.^
regex in front of [0-9]
, not inside.Then would you please try:
awk '{if (match($5,/^[0-9]+$/)) print $1, $2, $3, $4, $5; else print $1, $2, $3, $4, $NF}' my_data.text
Output:
yogutdb01 Mon 28 Jun 2021
yogutdb02 Thu 30 Sep 2021
yogutdb03 Thu Jul 13 2017
yogutdb04 Fri Jun 23 2017
yogutdb05 Thu Jul 13 2017
yogutdb06 Wed 24 Jun 2020
yogutdb07 Wed 24 Jun 2020
yogutdb08 Sat 22 May 2021
yogutdb09 Thu 09 Apr 2020
yogutdb10 Tue 11 May 2021
Here is an alternative using ~
operator:
awk '$5 ~ /^[0-9]+$/ {print $1, $2, $3, $4, $5; next} {print $1, $2, $3, $4, $NF}' my_data.text
Upvotes: 3