Tsabit Abdullah
Tsabit Abdullah

Reputation: 3

how to add column in file txt based on filename

this is my script

SourceFile='/root/Document/Source/'

FND=$(find $SourceFile. -regextype posix-regex -iregex "^.*/ABCDEF_555_[0-9]{5}\.txt$")
echo $FND
#*I've tried using "awk" but haven't gotten perfect results*

File Name:

ABCDEF_555_12345.txt
ABCDEF_555_54321.txt
ABCDEF_555_11223.txt

BEFORE

File Content from ABCDEF_555_12345.txt:
no|name|address|pos_code
1|rick|ABC|12342
2|rock|ABC|12342
3|Robert|DEF|54321

File Content from ABCDEF_555_54321.txt:
no|id|name|city
1|0101|RIZKI|JKT
2|0102|LALA|SMG
3|0302|ROY|YGY

i want to append a column that shows the file name in every row starting from the 2nd, and append a column with name_file to the first and i want to change the contents of the original files.

AFTER

file: ABCDEF_555_12345.txt
no|name|address|pos_code|name_file
1|rick|ABC|12342|ABCDEF_555_12345.txt
2|rock|ABC|12342|ABCDEF_555_12345.txt
3|Robert|DEF|54321|ABCDEF_555_12345.txt

file: ABCDEF_555_54321.txt
no|id|name|city|name_file
1|0101|RIZKI|JKT|ABCDEF_555_54321.txt
2|0102|LALA|SMG|ABCDEF_555_54321.txt
3|0302|ROY|YGY|ABCDEF_555_54321.txt

please give me light to find a solution :)) Thanks :))

Upvotes: 0

Views: 6152

Answers (2)

AlexisBRENON
AlexisBRENON

Reputation: 3079

The best solution is to use awk.

If it's the first line (NR == 1), print the line and append |name_file. For all other lines print the line and append the filename using the FILENAME variable:

awk 'NR == 1 {print $0 "|name_file"; next;}{print $0 "|" FILENAME;}' foo.txt

You can either use it with multiple files:

find . -iname "*.txt" -print0 | xargs -0 awk '
NR == 1 {print $0 "|name_file"; next;}
FRN == 1 {next;} # Skip header of next files
{print $0 "|" FILENAME;}'

My first solution used to use the paste command.

Paste allows you to concatenate files horizontally (compared to cat which concatenates vertically). To achieve the following with paste, do:

  1. first concatenate the first line of your file (head -n1 foo.txt) with the column header (echo "name_file"). The command paste accept the -d flag to define the separator between columns.
  2. second, extract all lines except the first (tail -n+2 foo.txt) and concatenate them with as many foo.txt required (use a for loop, computing the number of lines to fill.

The solution looks like this:

paste -d'|' <(head -n1 foo.txt) <(echo "name_file")
paste -d'|' <(tail -n+2 foo.txt) <(for i in $(seq $(tail -n+2 foo.txt | wc -l)); do echo "foo.txt"; done)
no|name|address|pos_code|name_file
1|rick|ABC|12342|foo.txt
2|rock|ABC|12342|foo.txt
3|Robert|DEF|54321|foo.txt

However, the awk solution must be prefered because it is clearer (only one call, less process substitutions and co.), and faster.

$ wc -l foo.txt
100004 foo.txt

$ time ./awk.sh >/dev/null
./awk.sh > /dev/null  0,03s user 0,01s system 98% cpu 0,041 total

$ time ./paste.sh >/dev/null
./paste.sh > /dev/null  0,38s user 0,33s system 154% cpu 0,459 total

Upvotes: 3

Freddy
Freddy

Reputation: 4688

Using find and GNU awk:

My find implementation doesn't have regextype posix-regex and I used posix-extended instead, but since you got the correct results it should be fine.

srcdir='/root/Document/Source/'
find "$srcdir" -regextype posix-regex -iregex ".*/ABCDEF_555_[0-9]{5}\.txt$"\
    -exec awk -i inplace -v fname="{}" '
  BEGIN{ OFS=FS="|"; sub(/.*\//, "", fname) }    # set field separators / extract filename
  { $(NF+1)=NR==1 ? "name_file" : fname; print } # add header field / filename, print line
' {} \;

The pathname found by find is passed to awk in variable fname. In the BEGIN block the filename is extracted from the path.

The files are modified "inplace", make sure you make a backup of your files before running this.

Upvotes: 0

Related Questions