Reputation: 11
I'm newbie in using bash and grep ... I am trying to output a CSV file from a TXT file that contains this lines:
Input:
1. Fisrt - Name: Joanna Last - Name: Yang
Place of birth: Paris Date of birth: 01/01/1972 Sex: F
Number: 0009876541234567
2. Fisrt - Name: Bob Last - Name: Lee
Place of birth: London Date of birth: 05/08/1969 Sex: M
Number: 0005671890765223
Output:
"Joanna","Yang","Paris","01/01/1972","F","0009876541234567"
"Bob","Lee","London","05/08/1969","M","0005671890765223"
Any suggestions would be appreciated!!!!
Upvotes: 0
Views: 65
Reputation: 1187
In one line:
~ $ cat yourfile.txt
1. Fisrt - Name: Joanna Last - Name: Yang
Place of birth: Paris Date of birth: 01/01/1972 Sex: F
Number: 0009876541234567
2. Fisrt - Name: Bob Last - Name: Lee
Place of birth: London Date of birth: 05/08/1969 Sex: M
Number: 0005671890765223
~ $ sed -r "s/^.*Fisrt - Name: (.*) Last - Name: (.*)$/\1,\2;/g" yourfile.txt | sed -r "s/^Place of birth: (.*) Date of birth: (.*) Sex: (.*)$/\1,\2,\3;/g" | sed -r "s/^Number: (.*)$/\1/g" | sed -n 'H;${x;s/;\n/,/g;s/^,//;p;}' | tail -n +2 > yourfile.csv
~ $ cat yourfile.csv
Joanna,Yang,Paris,01/01/1972,F,0009876541234567
Bob,Lee,London,05/08/1969,M,0005671890765223
~ $
Hope it helps.
Upvotes: 0
Reputation: 141373
If your file is nice and nice formatted, no regex are needed.
We can read three lines at a time and split them on spaces - we are interested in only specified fields. If you can "assert" that no fields from the file will have spaces (I think no valid human name has spaces in it... right?), you can just do this:
while
IFS=' ' read -r _ _ _ _ name _ _ _ last &&
IFS=' ' read -r _ _ _ birthplace _ _ _ birthdate _ sex &&
IFS=' ' read -r _ number
do
printf '"%s","%s","%s","%s","%s","%s"\n' \
"$name" "$last" "$birthplace" "$birthdate" "$sex" "$number"
done <input
Live version available at onlinedbg.
Upvotes: 0
Reputation: 7986
Using only one regex with grep won't be easy.
You can try with multiple regexs and concat the results.
For instance:
To get the first names you can use this regex : "Fisrt - Name: ([a-zA-Z]+)"
.
Save this into a variable.
Next to get the birth dates you can use "birth: ([0-9]+\/[0-9]+\/+[0-9]+)"
.
Save this into a variable.
Do it for each part and concatenate the results with a coma.
Its clearly not the best way but it's a start. To help with regex you can use https://regex101.com/ .
Maybe try using the sed command line
Upvotes: 1