Reputation: 95
I have txt files, all of which are in the same directory. Each one has 2 columns of data. They look like this:
Label1 DataA1
Label2 DataA2
Label3 DataA3
I would like to use join to make a one large file like this.
Label1 DataA1 DataB1 DataC1
Label2 DataA2 DataB2 DataC2
Label3 DataA3 DataB3 DataC3
Currently, I have
join fileA fileB | join - fileC
However, I have too many files to make it practical to list all of them - is there a way to write a loop for this sort of command?
Upvotes: 5
Views: 3347
Reputation: 75498
With bash you could create a script that does a recursive pipe exec for join:
#!/bin/bash
if [[ $# -ge 2 ]]; then
function __r {
if [[ $# -gt 1 ]]; then
exec join - "$1" | __r "${@:2}"
else
exec join - "$1"
fi
}
__r "${@:2}" < "$1"
fi
And pass the files as parameters to the script like:
bash script.sh file*
Or a sorted form like:
find -type f -maxdepth 1 -name 'file*' -print0 | sort -z | xargs -0 bash script.sh
Upvotes: 4
Reputation: 75498
With awk you could do it like this:
awk 'NF > 0 { a[$1] = a[$1] " " $2 } END { for (i in a) { print i a[i]; } }' file*
If you want to sort your files:
find -type f -maxdepth 1 -name 'file*' -print0 | sort -z | xargs -0 awk 'NF > 0 { a[$1] = a[$1] " " $2 } END { for (i in a) { print i a[i]; } }'
Sometimes for (i in a) populates the keys not in the order that they were added so you could also sort it but this is only available in gawk. The idea of mapping keys in an indexed array for the order is only possible if column 1 doesn't have differences.
gawk 'NF > 0 { a[$1] = a[$1] " " $2 } END { count = asorti(a, b); for (i = 1; i <= count; ++i) { j = b[i]; print j a[j]; } }' ...
Upvotes: 2
Reputation: 33327
This script joins multiple files together (The files are file*
).
#!/bin/bash
# Create two temp files
tmp=$(mktemp)
tmp2=$(mktemp)
# for all the files
for file in file*
do
# if the tmp file is not empty
if [ -s "$tmp" ]
then
# then join the tmp file with the current file
join "$tmp" "$file" > "$tmp2"
else
# the first time $tmp is empty, so we just copy the file
cp "$file" "$tmp2"
fi
cp "$tmp2" "$tmp"
done
cat "$tmp"
I admit that it is ugly, but it seems to work.
Upvotes: 0