dabest1
dabest1

Reputation: 2502

Append first line of paragraph to multiple lines

I have a tab delimited file that contains date, header row, some values, empty row, and then it repeats all over again multiple times. The file looks something like this:

November 3, 2011
column_name1    column_name2    column_name3    column_name4
value   value   value   value
value   value   value   value
value   value   value   value
value   value   value   value

November 4, 2011
column_name1    column_name2    column_name3    column_name4
value   value   value   value
value   value   value   value
value   value   value   value
value   value   value   value

I am trying to find the right sed or awk commands to tranform the data so it could be used to create charts. I want the transformed data to look like this:

date    column_name1    column_name2    column_name3    column_name4
November 3, 2011    value   value   value   value
November 3, 2011    value   value   value   value
November 3, 2011    value   value   value   value
November 3, 2011    value   value   value   value

date    column_name1    column_name2    column_name3    column_name4
November 4, 2011    value   value   value   value
November 4, 2011    value   value   value   value
November 4, 2011    value   value   value   value
November 4, 2011    value   value   value   value

Upvotes: 1

Views: 218

Answers (3)

ephemient
ephemient

Reputation: 204876

Awk.

BEGIN {
    FS = "\n"
    RS = "\n\n"
    OFS = "\t"
    #ORS = "\n"
}
{
    print "date" OFS $2
    for (i = 3; i <= NF; i++)
        print $1 OFS $i
    print ""
}

Upvotes: 2

potong
potong

Reputation: 58473

This GNU sed solution might work:

 sed -r '/^[A-Z][a-z]+\s+[0-9][0-9]?,\s+([0-9]{4})/,/^$/{//{h;/^$/!{s/.*//;N;s/\n/date /;b}}};G;s/(.*)\n(.*)/\2 \1/;' input_file

EDIT: I should have included an explanation!

The sed command only changes lines between those that start with a date /^[A-Z][a-z]+\s+[0-9][0-9]?,\s+([0-9]{4})/and an empty line /^$/.If so, and the line matches one of those two conditions // it stores it in the hold space h, additonally if the line is not an empty one (i.e. that is a date), it clears it s/.*//, appends the next line N and then prepends the literal date to it s/\n/data. When this is all done, it breaks b to read in the next line. For all lines following (remember this is within the starting condition), it appends the hold space G (line containing the date) to the current line, then using substitution prepends the date and loses the newline s/(.*)\n(.*)/\2 \1/(a side effect of the G command). Voila!

Upvotes: 2

Birei
Birei

Reputation: 36272

Using 'Sed'

Content of 'infile':

$ cat infile
November 3, 2011
column_name1    column_name2    column_name3    column_name4
value   value   value   value
value   value   value   value
value   value   value   value
value   value   value   value

November 4, 2011
column_name1    column_name2    column_name3    column_name4
value   value   value   value
value   value   value   value
value   value   value   value
value   value   value   value

Content of the sed script:

$ cat script.sed
## When line has a date.
/[0-9]\+,[ ]*[0-9]\{4\}/ {
        ## Save date to HS (hold space).
        h
        ## Read next line (header).
        N
        ## Insert 'date' string at the beginning of the line.
        s/.*\n/date\t/
        ## Print and read next line.
        P
        n
}

## Process next line if blank line found.
/^[ \t]*$/ {
        p
        d
}

## Process data inserting the date in the beginning.
## Put at the end of PS (pattern space) the date saved before and exchange it 
## with the rest of the line. Print after that.
G
s/^\(.*\)\n\(.*\)$/\2\t\1/
p

Execute the script:

$ sed -n -f script.sed infile
date    column_name1    column_name2    column_name3    column_name4
November 3, 2011        value   value   value   value
November 3, 2011        value   value   value   value
November 3, 2011        value   value   value   value
November 3, 2011        value   value   value   value

date    column_name1    column_name2    column_name3    column_name4
November 4, 2011        value   value   value   value
November 4, 2011        value   value   value   value
November 4, 2011        value   value   value   value
November 4, 2011        value   value   value   value

Upvotes: 3

Related Questions