Reputation: 45
Hi I have a file that I need to put in a format that I can pull into excel spreadsheet I do not know how to do this and I would appreciate if you could help me out.
Here is the input sample
#1
Indiana University—Bloomington (Kelley)
Bloomington, IN
90 58 82 86
#1
Temple University (Fox)
Philadelphia, PA
95 66 97 95
#1
University of North Carolina—Chapel Hill (Kenan-Flagler)
Chapel Hill, NC
73 58 100 75
#4
Here is the output
#1, Indiana University—Bloomington (Kelley) Bloomington, IN, 90, 58, 82, 86,
#1, Temple University (Fox) Philadelphia, PA, 95, 66, 97, 95,
I'm using shell scripting in linux
Thanks
Upvotes: 1
Views: 175
Reputation: 47099
Here is an alternative way of doing it with awk by only manipulating the output field separator (OFS
) and the output record separator (ORS
):
grep -v '^$' infile | # remove empty lines
awk 'NR%4 { ORS=", "; OFS=" " } NR%4 == 0 { ORS="\n"; OFS=", " } $1=$1'
Output:
#1, Indiana University—Bloomington (Kelley), Bloomington, IN, 90, 58, 82, 86
#1, Temple University (Fox), Philadelphia, PA, 95, 66, 97, 95
#1, University of North Carolina—Chapel Hill (Kenan-Flagler), Chapel Hill, NC, 73, 58, 100, 75
#4,
Upvotes: 0
Reputation: 10039
sed '#n;/[0-9 ]/ s/ */, /g;/^ *$/d;H;$!b;g;s/.//;s/\n\([^#]\)/, \1/g;p' YourFile
#
by a ,
and the followed char itselfif the last ,
is mandatory (normaly not in a csv/excel file) adapt the /[0-9 ]/ s/ */, /g
with this ;/[0-9 ]/ {s/ */, /g; s/$/,/;}
Upvotes: 0
Reputation: 2236
Awk script to solve the problem:
/^#[0-9]/ {current = $0}
/\([A-Za-z ]+\)/ { current = current "," $0}
/[A-Z]+$/ { current = current $0}
/^[0-9]+/ {current = current "," $1 "," $2 "," $3 "," $4; print current}
Usage:
cat yourdatafile | awk -f script.awk > output.csv
Explanation:
Each of the regexes match the patterns on the different lines and executes the action for that line beside the regex.
Upvotes: 1
Reputation: 36337
While it's entirely possible you can do that with a bit of awk
scripting, I'd recommend you don't do that.
Actually, awk
is handy for anything that's not too complex, but here, since you're already planning to use Excel, you might as well just import the plain file, and then process it in excel, pivoting, reshaping, splitting it there.
However, I hate Excels complexity, so here's my python2 approach (saving it as program.py
and making it executable as chmod 755 program.py
):
#!/usr/bin/python
import sys
wholefile = open(sys.argv[1], "r").read()
parts = wholefile.split("#")
for item in parts:
lines = item.split("\n")
output = [ int(lines[0]), lines[2], lines[3],lines[5].split() ]
print ";".join(output)
and run this as
program.py input.txt > output.csv
EDIT: typo, and:
I tend to say this too often, but doing something in a shell script that isn't very centered on the wish to invoke a lot of commands is often far less effective than using any general purpose scripting language. Python is so abundant everywhere that I seldom find myself writing bash scripts.
EDIT2: Ok, so no python on your host. scary ;P. Use bash
's built-in read
function (man read
).
Upvotes: 0
Reputation: 44023
This is rather simple with GNU awk and mawk if you don't try to use it in a line-based manner. We'll use a #
at the beginning of a line as record separator and a newline as a field separator. Then:
awk -v RS='(^|\n)#' -F'\n' 'NR > 1 { gsub(/ +/, ", ", $6); print "#" $1 ", " $3 " " $4 ", " $6 }' filename
That is:
NR > 1 { # the first record is the empty bit before
# the first separator, so we skip it
gsub(/ +/, ", ", $6) # then: insert commas in the number row
print "#" $1 ", " $3 " " $4 ", " $6 # and reassemble the record in the right
# format for printing.
}
The use of a regex as record separator is not strictly POSIX-conforming, but between gawk and mawk, you'll have most bases covered.
Upvotes: 3