Reputation: 5480
I have a file in linux
. The contents of the file are below.
Test_12
Test_abc
start_1
start_abcd
end_123
end_abcde_12
Now I want to split the file into multiple small files based on matching string that comes after the first underscore
Ouput:
Test.txt:
Test_12
Test_abc
start.txt:
start_1
start_abcd
end.txt:
end_123
end_abcde_12
I have tried like below
while read -r line ; do
echo "$line" >> "${line}.txt"
done < split.txt
But I got files for each line.
What am I doing wrong here and how can I get my desired output?
Upvotes: 0
Views: 834
Reputation: 791
Can you try this:
while read line; do
content=`echo $line|awk 'BEGIN{FS="_"}{print $1}'`
for f in *; do
filename=`echo $f|awk 'BEGIN{FS="."}{print $1}'`
if [ "$content" == "$filename" ]; then
echo $line>>$f
break
else
echo $line>>$content.txt
break
fi
done
done< file.txt
Output:
bash-4.4$ ls -lrt
total 12
-rw-r--r-- 1 21726 21726 978 Sep 22 04:54 README.txt
-rw-r--r-- 1 21726 21726 49 Sep 22 04:56 file.txt
-rwxr-xr-x 1 21726 21726 252 Sep 22 05:06 script.sh
bash-4.4$ cat file.txt
Test_12
Test_abc
Start_1
Start_abc
end_1
end_abc
bash-4.4$ ./script.sh
bash-4.4$ ls -lrt
total 24
-rw-r--r-- 1 21726 21726 978 Sep 22 04:54 README.txt
-rw-r--r-- 1 21726 21726 49 Sep 22 04:56 file.txt
-rwxr-xr-x 1 21726 21726 252 Sep 22 05:06 script.sh
-rw-r--r-- 1 21726 21726 17 Sep 22 05:06 Test.txt
-rw-r--r-- 1 21726 21726 18 Sep 22 05:06 Start.txt
-rw-r--r-- 1 21726 21726 14 Sep 22 05:06 end.txt
bash-4.4$ cat Start.txt
Start_1
Start_abc
bash-4.4$ cat Test.txt
Test_12
Test_abc
bash-4.4$ cat end.txt
end_1
end_abc
Upvotes: 0
Reputation: 784918
Better to use awk for this:
awk -F_ 'p && $1 != p{close(fn)} {p=$1; fn=p ".txt"; print>>fn} END{close(fn)}' split.txt
There is little bit of extra handling to close the files when value in first column changes so that we don't have too many open files if your input file is huge.
Upvotes: 2
Reputation: 203169
Read why-is-using-a-shell-loop-to-process-text-considered-bad-practice and then just use awk.
With GNU awk all you need is:
awk -F'_' '{print > ($1".txt")}' file
Otherwise with other awks, if your input file is grouped by the first field as shown in your question then all you need is:
awk -F'_' '{f=$1".txt"; print > f} f!=p{close(p); p=f}' file
and if it isn't then it's just slightly less efficient as you may need to re-open a file that was previously closed (hence the >>
instead of >
):
awk -F'_' '{f=$1".txt"; print >> f} f!=p{close(p); p=f}' file
Upvotes: 0
Reputation: 361565
You need to trim the underscore and trailing text from each line. %%_*
does that:
while read -r line ; do
echo "$line" >> "${line%%_*}.txt"
done < split.txt
Explanation:
%
: trim trailing text%%
: find the longest possible match_*
: an underscore and everything afterUpvotes: 1