Justin
Justin

Reputation: 95

How to join multiple txt files into based on column?

I have txt files, all of which are in the same directory. Each one has 2 columns of data. They look like this:

Label1 DataA1
Label2 DataA2
Label3 DataA3

I would like to use join to make a one large file like this.

Label1 DataA1 DataB1 DataC1
Label2 DataA2 DataB2 DataC2
Label3 DataA3 DataB3 DataC3

Currently, I have

join fileA fileB | join - fileC

However, I have too many files to make it practical to list all of them - is there a way to write a loop for this sort of command?

Upvotes: 5

Views: 3347

Answers (3)

konsolebox
konsolebox

Reputation: 75498

With bash you could create a script that does a recursive pipe exec for join:

#!/bin/bash

if [[ $# -ge 2 ]]; then
    function __r {
        if [[ $# -gt 1 ]]; then
            exec join - "$1" | __r "${@:2}"
        else
            exec join - "$1"
        fi
    }

    __r "${@:2}" < "$1"
fi

And pass the files as parameters to the script like:

bash script.sh file*

Or a sorted form like:

find -type f -maxdepth 1 -name 'file*' -print0 | sort -z | xargs -0 bash script.sh

Upvotes: 4

konsolebox
konsolebox

Reputation: 75498

With awk you could do it like this:

awk 'NF > 0 { a[$1] = a[$1] " " $2 } END { for (i in a) { print i a[i]; } }' file*

If you want to sort your files:

find -type f -maxdepth 1 -name 'file*' -print0 | sort -z | xargs -0 awk 'NF > 0 { a[$1] = a[$1] " " $2 } END { for (i in a) { print i a[i]; } }' 

Sometimes for (i in a) populates the keys not in the order that they were added so you could also sort it but this is only available in gawk. The idea of mapping keys in an indexed array for the order is only possible if column 1 doesn't have differences.

gawk 'NF > 0 { a[$1] = a[$1] " " $2 } END { count = asorti(a, b); for (i = 1; i <= count; ++i) { j = b[i]; print j a[j]; } }' ...

Upvotes: 2

user000001
user000001

Reputation: 33327

This script joins multiple files together (The files are file*).

#!/bin/bash
# Create two temp files
tmp=$(mktemp)
tmp2=$(mktemp)
# for all the files
for file in file*
do
    # if the tmp file is not empty
    if [ -s "$tmp" ]
    then
        # then join the tmp file with the current file
        join "$tmp" "$file" > "$tmp2"
    else
        # the first time $tmp is empty, so we just copy the file
        cp "$file" "$tmp2"
    fi
    cp "$tmp2" "$tmp"
done

cat "$tmp"

I admit that it is ugly, but it seems to work.

Upvotes: 0

Related Questions