Reputation: 850
I've got an R summary table like so:
employee salary startdate
John Doe :1 Min. :21000 Min. :2007-03-14
Jolie Hope:1 1st Qu.:22200 1st Qu.:2007-09-18
Peter Gynn:1 Median :23400 Median :2008-03-25
Mean :23733 Mean :2008-10-02
3rd Qu.:25100 3rd Qu.:2009-07-13
Max. :26800 Max. :2010-11-01
and I need to produce an output csv file like so:
employee,,salary,,startdate,,
John Doe,1,Min.,21000,Min.,2007-03-14
Jolie Hope,1,1st Qu.,22200,1st Qu.,2007-09-18
Peter Gynn,1,Median,23400,Median,2008-03-25
,,Mean,23733,Mean,2008-10-02
,,3rd Qu.,25100,3rd Qu.,2009-07-13
,,Max.,26800,Max.,2010-11-01
so that in excel it looks something like this:
However it doesn't suffice to split the fields by one or more white spaces,
awk -F "[ ]+" '{ print $3 }'
It works for the header, but not for the remaining lines:
salary
Doe
Hope:1
Gynn:1
:23733
Qu.:25100
:26800
Is this problem solvable using awk (and maybe sed)?
Upvotes: 2
Views: 394
Reputation: 10039
sed '1 {
s/^[[:space:]]*\([^[:space:]]\{1,\}\)[[:space:]]\{1,\}\([^[:space:]]\{1,\}\)[[:space:]]\{1,\}[[:space:]]\{1,\}\([^[:space:]]\{1,\}\)/\1,,\2,,\3,/
b
}
s/[[:space:]]\{1,\}:/:/g
/^[[:space:]]*\([^:]\{1,\}\):\([^[:space:]]*\)[[:space:]]*\([^:]\{1,\}\):\([^[:space:]]*\)[[:space:]]*\([^:]\{1,\}\):\(.[^[:space:]]*\)/ {
s//\1,\2,\3,\4,\5,\6/
b
}
/^[[:space:]]*\([^:]\{1,\}\):\([^[:space:]]*\)[[:space:]]*\([^:]\{1,\}\):\([^[:space:]]*\)/ {
s//,,\1,\2,\3,\4/
b
}
' YourFile
sed one, just for the fun if you need to adapt a bit in this ArachnoRegEx
awk is lot more interesting in this case mainly for any adaptation to add later but if you only have access to sed ...
Upvotes: 3
Reputation: 203169
This uses GNU awk for FIELDWIDTHS, etc. and relies on the first line of input after the header always having all fields populated. It includes the positions that are just :
s as output fields, I expect you can figure out how to skip those if you do want to use this solution:
$ cat tst.awk
BEGIN { OFS="," }
NR==1 {
for (i=1;i<=NF;i++) {
printf "%s%s", $i, (i<NF?OFS OFS OFS:ORS)
}
next
}
NR==2 {
tail = $0
while ( match(tail,/([^:]+):(\S+(\s+|$))/,a) ) {
FIELDWIDTHS = FIELDWIDTHS length(a[1]) " 1 " length(a[2]) " "
tail = substr(tail,RSTART+RLENGTH)
}
$0 = $0
}
{
for (i=1;i<=NF;i++) {
gsub(/^\s+|\s+$/,"",$i)
}
print
}
$ awk -f tst.awk file
employee,,,salary,,,startdate
John Doe,:,1,Min.,:,21000,Min.,:,2007-03-14
Jolie Hope,:,1,1st Qu.,:,22200,1st Qu.,:,2007-09-18
Peter Gynn,:,1,Median,:,23400,Median,:,2008-03-25
,,,Mean,:,23733,Mean,:,2008-10-02
,,,3rd Qu.,:,25100,3rd Qu.,:,2009-07-13
,,,Max.,:,26800,Max.,:,2010-11-01
Upvotes: 1