Allen Qin
Allen Qin

Reputation: 19947

Combine multiple text files using awk

I have some minutely stats saved in text files and named as 1min.txt, 2min.txt etc.

1min.txt
F1,21
F2,32
F3,22

2min.txt
F2,12
F4,32

I would like to combine these files in the following format:

combined.txt
Field   1min    2min
F1      21      0
F2      32      12
F3      22      0
F4      0       32

Some fields may not exist in some files and 0 will be entered for those fields.

I've tried to do it using awk but couldn't find an easy way. Can someone please help?

Thanks

Upvotes: 1

Views: 729

Answers (5)

jaypal singh
jaypal singh

Reputation: 77075

Using awk:

awk -F, '
!seen[FILENAME]++ { 
    fname[++numFile] = FILENAME 
}
{
    flds[$1]++;
    map[FILENAME,$1] = $2
}
END {
    printf "%-10s", "FIELD";
    for (cnt=1; cnt<=numFile; cnt++) {
        file = fname[cnt];
        sub (/.txt/, "", file);
        printf "%-10s", file; 
    }
    print ""; 
    for (fld in flds) { 
        printf "%-10s", fld; 
        for (cnt=1; cnt<=numFile; cnt++) {
            printf "%-10s", map[fname[cnt],fld]+0
        }
        print "";
    }
}' 1min.txt 2min.txt

Output:

FIELD      1min       2min      
F1         21         0         
F2         32         12        
F3         22         0         
F4         0          32     

Once you have reviewed the output, you can re-direct the output to another file. You can pass as many files at the end as you want. If you have way too many then you can even use shell glob, for eg: *.txt

Note: I haven't guaranteed the order of fields since they are not always present in all files.

Here is a pure fun perl japh that will do the same:

perl -F, -lane'
$f{$ARGV}++; $h{$F[0]}
{$ARGV}=      $F[  1  ]
}{print       join"\t",
"FIELD",      map{s/.[tx]+
//x           ;$_}sort{$a
<=>$b}        keys%f;print
join"\n",    map{$f
=$_;         join
"\t",  $f,map
{$h{$f
}{$_}
//=0}
sort{$a
<=>$b}
keys%f
}sort
keys%h;
' *.txt

Output:

FIELD   1min    2min
F1      21      0
F2      32      12
F3      22      0
F4      0       32

Upvotes: 3

perreal
perreal

Reputation: 97918

Using join:

join -t , input1 input2 -j 1 -o "0 1.2 2.2" -e 0 -a1 -a2 | column -t -s,

Gives:

F1  21  0
F2  32  12
F3  22  0
F4  0   32

To add a header:

join -t , input1 input2 -j 1 -o "0 1.2 2.2" -e 0 -a1 -a2 | \
     sed '1iField,1min,2min' | column -t -s,

And the result looks like:

Field  1min  2min
F1     21    0
F2     32    12
F3     22    0
F4     0     32

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203129

$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
{ keys[$1]; val[$1,NR==FNR] = $2 }
END {
    print "Field", "1min", "2min"
    for (key in keys) {
        print key, val[key,1]+0, val[key,0]+0
    }
}

$ awk -f tst.awk 1min.txt 2min.txt
Field   1min    2min
F1      21      0
F2      32      12
F3      22      0
F4      0       32

If you care about the output order, tell us what order you're looking for - the order the keys were seen across both files or alphabetical or something else. If it's the order they are seen then that'd be:

$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
!seen[$1]++ { keys[++numKeys] = $1 }
{ val[$1,NR==FNR] = $2 }
END {
    print "Field", "1min", "2min"
    for (k=1; k<=numKeys; k++) {
        key = keys[k]
        print key, val[key,1]+0, val[key,0]+0
    }
}

Upvotes: 1

Abhishek Mittal
Abhishek Mittal

Reputation: 366

I have written some python code to solve you problem.

fh_1 = open("1min.txt", "r")
fh_2 = open("2min.txt", "r")
fh_3 = open("combine.txt", "w")

min_c_1 = {}
min_c_2 = {}

lines_of_text = ["Field   1min    2min\n"]

for l1 in fh_1.readlines():
    data = l1.split(',')
    min_c_1[data[0]] = data[1].rstrip()

for l1 in fh_2.readlines():
    data = l1.split(',')
    min_c_2[data[0]] = data[1].rstrip()

for key in min_c_1.keys():
    if key in min_c_2.keys():
        msg = str(key) + "      " + str(min_c_1[key]) + "      " +  str(min_c_2[key]) + "\n"
        lines_of_text.append(msg)
        del min_c_2[key]
    else:
        msg = str(key) + "      " + str(min_c_1[key]) + "      0" + "\n"
        lines_of_text.append(msg)

for key in min_c_2.keys():
    msg = str(key) + "      0" + "       " + str(min_c_2[key]) + "\n"
    lines_of_text.append(msg)

fh_3.writelines(lines_of_text)

fh_1.close()
fh_2.close()
fh_3.close()

Please let me know if it does not help.

Upvotes: 0

Vaughn Cato
Vaughn Cato

Reputation: 64298

Awk allows you to explicitly read from files, so you can just put all the logic in a BEGIN section if you want. Here is an example:

awk -F, '
  BEGIN {
    while (getline <"1min.txt") {
      field[$1]=1
      a1[$1]=$2
    }
    while (getline <"2min.txt") {
      field[$1]=1
      a2[$1]=$2
    }
    print "Field\t1min\t2min"
    for (x in field) {
      print x"\t"(a1[x]+0)"\t"(a2[x]+0)
    }
  }
'

Upvotes: 0

Related Questions