Reputation: 67

Splitting file into multiple files

I want to split text file into multiple text files based on line starting with number(1. *) For example I want to split this text file into 2 files:

 1. J Med Chem. 2013 May 23;56(10):4028-43. doi: 10.1021/jm400241j. Epub 2013 May 13.

Optimization of benzoxazole-based inhibitors of Cryptosporidium parvum inosine
5'-monophosphate dehydrogenase.

Gorla SK, Kavitha M, Zhang M, Chin JE, Liu X, Striepen B, Makowska-Grzyska M, Kim
Y, Joachimiak A, Hedstrom L, Cuny GD.

Department of Biology, Brandeis University , 415 South Street, Waltham,
Massachusetts 02454, USA.

Cryptosporidium parvum is an enteric protozoan parasite that has emerged as a
major cause of diarrhea, malnutrition, and gastroenteritis and poses a potential 
bioterrorism threat.

PMID: 23668331  [PubMed - indexed for MEDLINE]


 2.Biochem Pharmacol. 2013 May 1;85(9):1370-8. doi: 10.1016/j.bcp.2013.02.014. Epub 
2013 Feb 16.

Carbonyl reduction of triadimefon by human and rodent 11β-hydroxysteroid
dehydrogenase 1.

Meyer A, Vuorinen A, Zielinska AE, Da Cunha T, Strajhar P, Lavery GG, Schuster D,
Odermatt A.

Swiss Center for Applied Human Toxicology and Division of Molecular and Systems
Toxicology, Department of Pharmaceutical Sciences, University of Basel,
Klingelbergstrasse 50, 4056 Basel, Switzerland.

11β-Hydroxysteroid dehydrogenase 1 (11β-HSD1) catalyzes the conversion of
inactive 11-oxo glucocorticoids (endogenous cortisone, 11-dehydrocorticosterone
and synthetic prednisone) to their potent 11β-hydroxyl forms (cortisol,
corticosterone and prednisolone).

Copyright © 2013 Elsevier Inc. All rights reserved.

PMID: 23419873  [PubMed - indexed for MEDLINE]

I tried this:

awk 'NF{print > $2;close($2);}' file

and this:

split -l 2

but I'm confused about how to give empty lines. (I'm new to awk.)

Upvotes: 0

Answers (2)

Vijay

Reputation: 67319

This should work.

awk -F"\." '/^ +[0-9]+\./
           {
            gsub(/ /,"",$1);
            file="file_"$1
           }
          {
            print >file
          }' Your_file

Upvotes: 0

Ed Morton

Reputation: 204638

I THINK what you're looking for is:

awk '/^[[:space:]]+[[:digit:]]+\./{ if (fname) close(fname); fname="out_"$1; sub(/\..*/,"",fname) } {print > fname}' file

Commented version per @zjhui's request:

awk '
/^[[:space:]]+[[:digit:]]+\./ {     # IF the line starts with spaces, then digits then a period THEN
    if (fname)                      #     IF the output file name variable is populated THEN
        close(fname)                #         close the file youve been writing to until now
                                    #     ENDIF
    fname="out_"$1                  #     set the output file name to the word "out_" followed by the first field of this line, e.g. "out_2.Biochem"
    sub(/\..*/,"",fname)            #     strip everything from the period on from the file name so it becomes e.g. "out_2"
}                                   # ENDIF
{                                   # IF true THEN
    print > fname                   #     print the current record to the filename stored in the variable fname, e.g. "out_2".
}                                   # ENDIF
' file

Upvotes: 3

Splitting file into multiple files

Answers (2)

Related Questions