ferrelwill
ferrelwill

Reputation: 821

Split file based on a key and name the output with the key name

does any one know how to split a file based on a key and name the relatedoutput with the respective key name. Thanx in advance

Input

>mail9.country1(+):38689378-38709400
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>father
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>mother
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>son
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>daughter
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------

>mailX.countryX(+):000000-3111111110
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>father
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>mother
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>son
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>daughter
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------

Output files should be like below with their respective content

mail9.country1(+):38689378-38709400.mail

>mail9.country1(+):38689378-38709400
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>father
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>mother
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>son
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>daughter
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------

mailX.countryX(+):000000-3111111110.mail

>mailX.countryX(+):000000-3111111110
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>father
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>mother
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>son
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------
>daughter
XXXXXXXXXXXXXXXXX-----------HHHHHHHH------

Upvotes: 0

Views: 142

Answers (2)

eeasterly
eeasterly

Reputation: 774

Also consider an approach using csplit, on MacOS. Just set up the input variables and run the code line by line in a bash terminal.

#input variables
filename=input_file
split_on="mail"
suffix=".mail"

#this regex is used to prepare the renaming, the crux of the approach
#see how it will work by running the first fgrep line.  The second fgrep line actually does the renaming.
regex_for_extracting_filename="s/^(xx[^:]+).+>(mail[^(]+)\(.+/mv \1 \2${suffix}/gi"

splitcount=$(fgrep -c "${split_on}" "${filename}")
splitnumber=$(( splitcount - 1 ))
echo -e "There are going to be the folllowing number of files: \n${splitcount}\n"

#actually does the splitting work
csplit -n${#splitcount} -ks  ${filename} "/${split_on}/" {${splitnumber}}

fgrep "${split_on}" xx* | perl -pe "${regex_for_extracting_filename}" #shows the filename rename command
fgrep "${split_on}" xx* | perl -pe "${regex_for_extracting_filename}" | bash -

Upvotes: 0

Chris Seymour
Chris Seymour

Reputation: 85865

One way with awk:

$ awk -F'>' '$2~/^mail/{f=$2".mail";gsub(/[)(]/,"_",f)}{print > f}' file

Upvotes: 1

Related Questions