Reputation: 67
I want to split text file into multiple text files based on line starting with number(1. *) For example I want to split this text file into 2 files:
1. J Med Chem. 2013 May 23;56(10):4028-43. doi: 10.1021/jm400241j. Epub 2013 May 13.
Optimization of benzoxazole-based inhibitors of Cryptosporidium parvum inosine
5'-monophosphate dehydrogenase.
Gorla SK, Kavitha M, Zhang M, Chin JE, Liu X, Striepen B, Makowska-Grzyska M, Kim
Y, Joachimiak A, Hedstrom L, Cuny GD.
Department of Biology, Brandeis University , 415 South Street, Waltham,
Massachusetts 02454, USA.
Cryptosporidium parvum is an enteric protozoan parasite that has emerged as a
major cause of diarrhea, malnutrition, and gastroenteritis and poses a potential
bioterrorism threat.
PMID: 23668331 [PubMed - indexed for MEDLINE]
2.Biochem Pharmacol. 2013 May 1;85(9):1370-8. doi: 10.1016/j.bcp.2013.02.014. Epub
2013 Feb 16.
Carbonyl reduction of triadimefon by human and rodent 11β-hydroxysteroid
dehydrogenase 1.
Meyer A, Vuorinen A, Zielinska AE, Da Cunha T, Strajhar P, Lavery GG, Schuster D,
Odermatt A.
Swiss Center for Applied Human Toxicology and Division of Molecular and Systems
Toxicology, Department of Pharmaceutical Sciences, University of Basel,
Klingelbergstrasse 50, 4056 Basel, Switzerland.
11β-Hydroxysteroid dehydrogenase 1 (11β-HSD1) catalyzes the conversion of
inactive 11-oxo glucocorticoids (endogenous cortisone, 11-dehydrocorticosterone
and synthetic prednisone) to their potent 11β-hydroxyl forms (cortisol,
corticosterone and prednisolone).
Copyright © 2013 Elsevier Inc. All rights reserved.
PMID: 23419873 [PubMed - indexed for MEDLINE]
I tried this:
awk 'NF{print > $2;close($2);}' file
and this:
split -l 2
but I'm confused about how to give empty lines. (I'm new to awk.)
Upvotes: 0
Views: 345
Reputation: 67211
This should work.
awk -F"\." '/^ +[0-9]+\./
{
gsub(/ /,"",$1);
file="file_"$1
}
{
print >file
}' Your_file
Upvotes: 0
Reputation: 203229
I THINK what you're looking for is:
awk '/^[[:space:]]+[[:digit:]]+\./{ if (fname) close(fname); fname="out_"$1; sub(/\..*/,"",fname) } {print > fname}' file
Commented version per @zjhui's request:
awk '
/^[[:space:]]+[[:digit:]]+\./ { # IF the line starts with spaces, then digits then a period THEN
if (fname) # IF the output file name variable is populated THEN
close(fname) # close the file youve been writing to until now
# ENDIF
fname="out_"$1 # set the output file name to the word "out_" followed by the first field of this line, e.g. "out_2.Biochem"
sub(/\..*/,"",fname) # strip everything from the period on from the file name so it becomes e.g. "out_2"
} # ENDIF
{ # IF true THEN
print > fname # print the current record to the filename stored in the variable fname, e.g. "out_2".
} # ENDIF
' file
Upvotes: 3