Reputation: 75
I have multiple files that look like this:
//file start
$thing1 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2013-10-01'};
$thing2 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2012-11-01'};
$thing3 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2014-12-01'};
//file end
Using Unix, what is the best way to grab all of the items in a file that are dates. I know that the items I'm looking for in the file look like
{somethingDate = '1111-11-11'}
From this I want to grab '1111-11'11'. File one will have multiple 'fileOneDate' entries and file two will have multiple 'fileTwoDate' entries, etc. My goal is to take all of these dates that are '*Date', remove duplicates, and sort them into an output file, which is easy enough using the sort command and pipes. However, I'm stuck on this first part. What I have so far looks like this:
<command I'm working on now that grabs dates> | sort -n > outputfile.txt
I believe the way to go would be an AWK script. What would be the right way to parse these files?
Upvotes: 1
Views: 67
Reputation: 247012
grep -o
is the simplest way to extract text.
sort -u
to sort (duh) and remove duplicates.
grep -oE '\<[0-9]{4}-[0-9]{2}-[0-9]{2}\>' <<'END' | sort -u
$thing1 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2013-10-01'};
$thing2 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2012-11-01'};
$thing3 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2014-12-01'};
$thing2b= {'item1' => '0', 'item2 => '3', 'itemDate' => '2012-11-01'};
$thing2c= {'item1' => '0', 'item2 => '3', 'itemDate' => 'foo2012-01-01bar'};
END
2012-11-01
2013-10-01
2014-12-01
Upvotes: 1
Reputation: 113924
If your sample file is called datefile
, then:
$ sed -nr "s/.*Date' => '([^']+)'.*/\1/p" datefile | sort -n
2012-11-01
2013-10-01
2014-12-01
The above regex looks for lines containing Date' => 'datestring'
and prints the datestring.
In more detail, the sed
command consists of a substitution which, in sed
-style, are written as s/old/new/options
. The old
part is a bit complicated so I will go through it piece by piece: the old
regex looks for (a) .*
means anything (any number of any characters), followed by (b) Date' => '
, followed by (c) ([^']+)
which means one or more characters that are not single quotes, followed by (d) a single quote, followed by (e) .*
, again meaning anything. If a match is made, then that line is replaced with the date string (saved as \1
because the date string regex was in parens) and then, because of the p
at the end of the expression, that date is printed. Because the -n
option is given to sed
, lines with no matching datestring are not printed.
If your sed
does not support -r
(OSX), then use a similar expression but with a few added backslashes:
sed -n "s/.*Date' => '\([^']\+\)'.*/\1/p" datefile | sort -n
Upvotes: 0
Reputation: 14949
Do you need like this?
sed -n "s/.*'\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)'.*/\1/p"
If you have -r
option in sed
,
sed -nr "s/.*'([0-9]{4}-[0-9]{2}-[0-9]{2})'.*/\1/p"
Test:
sat:~# echo "{somethingDate = '1111-11-11'}" | sed -n "s/.*'\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)'.*/\1/p"
1111-11-11
sat:~#
sat:~# echo "$thing1 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2013-10-01'};" | sed -n "s/.*'\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)'.*/\1/p"
2013-10-01
Upvotes: 1