airbear
airbear

Reputation: 75

Unix: Grabbing dates from file and sorting them

I have multiple files that look like this:

//file start
$thing1 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2013-10-01'};
$thing2 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2012-11-01'};
$thing3 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2014-12-01'};
//file end

Using Unix, what is the best way to grab all of the items in a file that are dates. I know that the items I'm looking for in the file look like

{somethingDate = '1111-11-11'}

From this I want to grab '1111-11'11'. File one will have multiple 'fileOneDate' entries and file two will have multiple 'fileTwoDate' entries, etc. My goal is to take all of these dates that are '*Date', remove duplicates, and sort them into an output file, which is easy enough using the sort command and pipes. However, I'm stuck on this first part. What I have so far looks like this:

<command I'm working on now that grabs dates> | sort -n  > outputfile.txt

I believe the way to go would be an AWK script. What would be the right way to parse these files?

Upvotes: 1

Views: 67

Answers (3)

glenn jackman
glenn jackman

Reputation: 247012

grep -o is the simplest way to extract text.
sort -u to sort (duh) and remove duplicates.

grep -oE '\<[0-9]{4}-[0-9]{2}-[0-9]{2}\>' <<'END' | sort -u
$thing1 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2013-10-01'};
$thing2 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2012-11-01'};
$thing3 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2014-12-01'};
$thing2b= {'item1' => '0', 'item2 => '3', 'itemDate' => '2012-11-01'};
$thing2c= {'item1' => '0', 'item2 => '3', 'itemDate' => 'foo2012-01-01bar'};
END
2012-11-01
2013-10-01
2014-12-01

Upvotes: 1

John1024
John1024

Reputation: 113924

If your sample file is called datefile, then:

$ sed -nr "s/.*Date' => '([^']+)'.*/\1/p" datefile | sort -n
2012-11-01
2013-10-01
2014-12-01

The above regex looks for lines containing Date' => 'datestring' and prints the datestring.

In more detail, the sed command consists of a substitution which, in sed-style, are written as s/old/new/options. The old part is a bit complicated so I will go through it piece by piece: the old regex looks for (a) .* means anything (any number of any characters), followed by (b) Date' => ', followed by (c) ([^']+) which means one or more characters that are not single quotes, followed by (d) a single quote, followed by (e) .*, again meaning anything. If a match is made, then that line is replaced with the date string (saved as \1 because the date string regex was in parens) and then, because of the p at the end of the expression, that date is printed. Because the -n option is given to sed, lines with no matching datestring are not printed.

If your sed does not support -r (OSX), then use a similar expression but with a few added backslashes:

sed -n "s/.*Date' => '\([^']\+\)'.*/\1/p" datefile | sort -n

Upvotes: 0

sat
sat

Reputation: 14949

Do you need like this?

sed -n "s/.*'\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)'.*/\1/p"

If you have -r option in sed,

sed -nr "s/.*'([0-9]{4}-[0-9]{2}-[0-9]{2})'.*/\1/p"

Test:

sat:~# echo "{somethingDate = '1111-11-11'}" | sed -n "s/.*'\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)'.*/\1/p"
1111-11-11
sat:~#
sat:~# echo "$thing1 = {'item1' => '0', 'item2 => '3', 'itemDate' => '2013-10-01'};" | sed -n "s/.*'\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)'.*/\1/p"
2013-10-01

Upvotes: 1

Related Questions