Reputation: 2502
I have a tab delimited file that contains date, header row, some values, empty row, and then it repeats all over again multiple times. The file looks something like this:
November 3, 2011
column_name1 column_name2 column_name3 column_name4
value value value value
value value value value
value value value value
value value value value
November 4, 2011
column_name1 column_name2 column_name3 column_name4
value value value value
value value value value
value value value value
value value value value
I am trying to find the right sed or awk commands to tranform the data so it could be used to create charts. I want the transformed data to look like this:
date column_name1 column_name2 column_name3 column_name4
November 3, 2011 value value value value
November 3, 2011 value value value value
November 3, 2011 value value value value
November 3, 2011 value value value value
date column_name1 column_name2 column_name3 column_name4
November 4, 2011 value value value value
November 4, 2011 value value value value
November 4, 2011 value value value value
November 4, 2011 value value value value
Upvotes: 1
Views: 218
Reputation: 204876
Awk.
BEGIN {
FS = "\n"
RS = "\n\n"
OFS = "\t"
#ORS = "\n"
}
{
print "date" OFS $2
for (i = 3; i <= NF; i++)
print $1 OFS $i
print ""
}
Upvotes: 2
Reputation: 58473
This GNU sed solution might work:
sed -r '/^[A-Z][a-z]+\s+[0-9][0-9]?,\s+([0-9]{4})/,/^$/{//{h;/^$/!{s/.*//;N;s/\n/date /;b}}};G;s/(.*)\n(.*)/\2 \1/;' input_file
EDIT: I should have included an explanation!
The sed command only changes lines between those that start with a date /^[A-Z][a-z]+\s+[0-9][0-9]?,\s+([0-9]{4})/
and an empty line /^$/
.If so, and the line matches one of those two conditions //
it stores it in the hold space h
, additonally if the line is not an empty one (i.e. that is a date), it clears it s/.*//
, appends the next line N
and then prepends the literal date
to it s/\n/data
. When this is all done, it breaks b
to read in the next line. For all lines following (remember this is within the starting condition), it appends the hold space G
(line containing the date) to the current line, then using substitution prepends the date and loses the newline s/(.*)\n(.*)/\2 \1/
(a side effect of the G
command). Voila!
Upvotes: 2
Reputation: 36272
Using 'Sed'
Content of 'infile':
$ cat infile
November 3, 2011
column_name1 column_name2 column_name3 column_name4
value value value value
value value value value
value value value value
value value value value
November 4, 2011
column_name1 column_name2 column_name3 column_name4
value value value value
value value value value
value value value value
value value value value
Content of the sed script:
$ cat script.sed
## When line has a date.
/[0-9]\+,[ ]*[0-9]\{4\}/ {
## Save date to HS (hold space).
h
## Read next line (header).
N
## Insert 'date' string at the beginning of the line.
s/.*\n/date\t/
## Print and read next line.
P
n
}
## Process next line if blank line found.
/^[ \t]*$/ {
p
d
}
## Process data inserting the date in the beginning.
## Put at the end of PS (pattern space) the date saved before and exchange it
## with the rest of the line. Print after that.
G
s/^\(.*\)\n\(.*\)$/\2\t\1/
p
Execute the script:
$ sed -n -f script.sed infile
date column_name1 column_name2 column_name3 column_name4
November 3, 2011 value value value value
November 3, 2011 value value value value
November 3, 2011 value value value value
November 3, 2011 value value value value
date column_name1 column_name2 column_name3 column_name4
November 4, 2011 value value value value
November 4, 2011 value value value value
November 4, 2011 value value value value
November 4, 2011 value value value value
Upvotes: 3