Reputation: 19947
I have some minutely stats saved in text files and named as 1min.txt, 2min.txt etc.
1min.txt
F1,21
F2,32
F3,22
2min.txt
F2,12
F4,32
I would like to combine these files in the following format:
combined.txt
Field 1min 2min
F1 21 0
F2 32 12
F3 22 0
F4 0 32
Some fields may not exist in some files and 0 will be entered for those fields.
I've tried to do it using awk but couldn't find an easy way. Can someone please help?
Thanks
Upvotes: 1
Views: 729
Reputation: 77075
Using awk
:
awk -F, '
!seen[FILENAME]++ {
fname[++numFile] = FILENAME
}
{
flds[$1]++;
map[FILENAME,$1] = $2
}
END {
printf "%-10s", "FIELD";
for (cnt=1; cnt<=numFile; cnt++) {
file = fname[cnt];
sub (/.txt/, "", file);
printf "%-10s", file;
}
print "";
for (fld in flds) {
printf "%-10s", fld;
for (cnt=1; cnt<=numFile; cnt++) {
printf "%-10s", map[fname[cnt],fld]+0
}
print "";
}
}' 1min.txt 2min.txt
FIELD 1min 2min
F1 21 0
F2 32 12
F3 22 0
F4 0 32
Once you have reviewed the output, you can re-direct the output to another file. You can pass as many files at the end as you want. If you have way too many then you can even use shell glob, for eg: *.txt
Note: I haven't guaranteed the order of fields since they are not always present in all files.
Here is a pure fun perl
japh that will do the same:
perl -F, -lane'
$f{$ARGV}++; $h{$F[0]}
{$ARGV}= $F[ 1 ]
}{print join"\t",
"FIELD", map{s/.[tx]+
//x ;$_}sort{$a
<=>$b} keys%f;print
join"\n", map{$f
=$_; join
"\t", $f,map
{$h{$f
}{$_}
//=0}
sort{$a
<=>$b}
keys%f
}sort
keys%h;
' *.txt
FIELD 1min 2min
F1 21 0
F2 32 12
F3 22 0
F4 0 32
Upvotes: 3
Reputation: 97918
Using join
:
join -t , input1 input2 -j 1 -o "0 1.2 2.2" -e 0 -a1 -a2 | column -t -s,
Gives:
F1 21 0
F2 32 12
F3 22 0
F4 0 32
To add a header:
join -t , input1 input2 -j 1 -o "0 1.2 2.2" -e 0 -a1 -a2 | \
sed '1iField,1min,2min' | column -t -s,
And the result looks like:
Field 1min 2min
F1 21 0
F2 32 12
F3 22 0
F4 0 32
Upvotes: 1
Reputation: 203129
$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
{ keys[$1]; val[$1,NR==FNR] = $2 }
END {
print "Field", "1min", "2min"
for (key in keys) {
print key, val[key,1]+0, val[key,0]+0
}
}
$ awk -f tst.awk 1min.txt 2min.txt
Field 1min 2min
F1 21 0
F2 32 12
F3 22 0
F4 0 32
If you care about the output order, tell us what order you're looking for - the order the keys were seen across both files or alphabetical or something else. If it's the order they are seen then that'd be:
$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
!seen[$1]++ { keys[++numKeys] = $1 }
{ val[$1,NR==FNR] = $2 }
END {
print "Field", "1min", "2min"
for (k=1; k<=numKeys; k++) {
key = keys[k]
print key, val[key,1]+0, val[key,0]+0
}
}
Upvotes: 1
Reputation: 366
I have written some python code to solve you problem.
fh_1 = open("1min.txt", "r")
fh_2 = open("2min.txt", "r")
fh_3 = open("combine.txt", "w")
min_c_1 = {}
min_c_2 = {}
lines_of_text = ["Field 1min 2min\n"]
for l1 in fh_1.readlines():
data = l1.split(',')
min_c_1[data[0]] = data[1].rstrip()
for l1 in fh_2.readlines():
data = l1.split(',')
min_c_2[data[0]] = data[1].rstrip()
for key in min_c_1.keys():
if key in min_c_2.keys():
msg = str(key) + " " + str(min_c_1[key]) + " " + str(min_c_2[key]) + "\n"
lines_of_text.append(msg)
del min_c_2[key]
else:
msg = str(key) + " " + str(min_c_1[key]) + " 0" + "\n"
lines_of_text.append(msg)
for key in min_c_2.keys():
msg = str(key) + " 0" + " " + str(min_c_2[key]) + "\n"
lines_of_text.append(msg)
fh_3.writelines(lines_of_text)
fh_1.close()
fh_2.close()
fh_3.close()
Please let me know if it does not help.
Upvotes: 0
Reputation: 64298
Awk allows you to explicitly read from files, so you can just put all the logic in a BEGIN
section if you want. Here is an example:
awk -F, '
BEGIN {
while (getline <"1min.txt") {
field[$1]=1
a1[$1]=$2
}
while (getline <"2min.txt") {
field[$1]=1
a2[$1]=$2
}
print "Field\t1min\t2min"
for (x in field) {
print x"\t"(a1[x]+0)"\t"(a2[x]+0)
}
}
'
Upvotes: 0