user14447697
user14447697

Reputation:

select columns from the files and save the output

i am new to the programming.I have many files in a directory as shown below: and each file consists of two column data.

TS.TST_X.1990-11-22
TS.TST_Y.1990-11-22
TS.TST_Z.1990-11-22

TS.TST_X.1990-12-30
TS.TST_Y.1990-12-30
TS.TST_Z.1990-12-30

At first i want to choose only second columns of all files having same name( only difference in X,Y,Z strings)(TS.TST_X.1990-11-22,TS.TST_Y.1990-11-22,TS.TST_Z.1990-11-22) and want to save the output In a file like TSTST19901112

Similarly for (TS.TST_X.1990-12-30,TS.TST_Y.1990-12-30,TS.TST_Z.1990-12-30 )files also and want to save the output like TSTST19901230

For example: if files contains like as below

TS.TST_X.1990-11-22                 TS.TST_Y.1990-11-22               TS.TST_Z.1990-11-22
1  2                                 1   3.4                          1    2.1
2  5                                 2   2.4                          2    4.2
3  2                                 3   1.2                          3    1.0
4  4                                 4   2.4                          4    3.5
5  8                                 5   6.3                          5    1.8

Then output file TSTST19901122 would be like

2   3.4    2.1
5   2.4    4.2
2   1.2    1.0
4   2.4    3.5
8   6.3    1.8

i tried the code

#!/bin/sh
for file in /home/min/data/*
do
awk '{print $2}' $file 
done

But my written code only reads the column of all files doesn't give expected output.So here i need experts help.

Upvotes: 0

Views: 209

Answers (3)

Akshay Hegde
Akshay Hegde

Reputation: 16997

Hope below example help you to start with, next time when you post in SO make sure you post input properly so that it will be easy for readers to help you:

Here is online : DEMO

[akshay@db1 tmp]$ cat test.sh 
#!/usr/bin/env bash 

# use sort and uniq where field sep being dot, 
# we get unique date
while IFS= read -r f; do 

    # creates veriable like TS.TST_*.1990-11-22
    i=$(sed 's/_[^.]/_*/' <<<"$f"); 

    # modify outfile if you want any extension suffix etc
    outfile=$(sed 's/[^[:alnum:]]//g' <<<"$i")".txt";


    # filename expansion with unquoted variable
    # finally use awk to print whatever you want

    paste $i | awk 'NR>1{for(i=2; i<=NF; i+=2)printf "%s%s", $(i), (i<NF ? OFS : ORS)}' >"$outfile"

done < <(printf '%s\n' TS.TST* | sort -t'.' -u -nk3)

[akshay@db1 tmp]$ bash test.sh 
[akshay@db1 tmp]$ cat TSTST19901122.txt 
2  3.4  2.1
5  2.4  4.2
2  1.2  1.0
4  2.4  3.5
8  6.3  1.8

Input:

[akshay@db1 tmp]$ ls TS.TST* -1
TS.TST_X.1990-11-22
TS.TST_Y.1990-11-22
TS.TST_Z.1990-11-22

[akshay@db1 tmp]$ for i in TS.TST*; do cat "$i"; done
TS.TST_X.1990-11-22 
1  2               
2  5              
3  2             
4  4          
5  8           
TS.TST_Y.1990-11-22
1   3.4
2   2.4
3   1.2
4   2.4
5   6.3
TS.TST_Z.1990-11-22
1    2.1
2    4.2
3    1.0
4    3.5
5    1.8

Upvotes: 4

RavinderSingh13
RavinderSingh13

Reputation: 133518

EDIT: Since OP has mentioned in comments that actual file names are little different so adding solution as per that here(since as per OP only 3 type of files with different year and month are there)..

for file in TS.TST_BHE*
do
      year=${file/*\./}
      year=${year//-/}
      yfile=${file/BHE/BHN}
      zfile=${file/BHE/BHZ}
      outfile="TSTST.$year"
      ##echo $file $yfile $zfile
      paste "$file" "$yfile" "$zfile"  | awk '{print $2,$4,$6}' > "$outfile"
done

Explanation: Adding detailed explanation for above.

for file in TS.TST_BHE*
##Going through TS.TST_BHE named files in for loop here, where variable file will have its name in it.
do
      year=${file/*\./}
      ##Creating year where removing everything till . here.
      year=${year//-/}
      ##Substituting all - with null in year variable.
      yfile=${file/BHE/BHN}
      ##Substituting BHE with BHN in file variable and saving it to yfile here.
      zfile=${file/BHE/BHZ}
      ##Substituting BHE with BHZ in file variable and saving it to zfile here.
      outfile="TSTST.$year"
      ##Creating outfile which has TSTST. with year variable value here.
      ##echo $file $yfile $zfile
      paste "$file" "$yfile" "$zfile"  | awk '{print $2,$4,$6}' > "$outfile"
      ##using paste to contenate values of 3 of the files(BHE BHN and BHZ) and printing only 2nd, 4th and 6th fields out of it.
done


Could you please try following, based on comment of OP that we could simply concatenate Input_files without checking 1st column's value.

for file in TS.TST_X*
do
      year=${file/*\./}
      year=${year//-/}
      yfile=${file/X/Y}
      zfile=${file/X/Z}
      outfile="TSTST.$year"
      ###echo $file $yfile $zfile ##Just to print variable values(optional)
      paste "$file" "$yfile" "$zfile"  | awk '{print $2,$4,$6}' > "$outfile"
done

For showing samples output will be as follows, above will generate file name d TS.TST_X.19901122 for shown samples.

cat TSTST.19901122
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8

Upvotes: 3

KamilCuk
KamilCuk

Reputation: 141020

The following recreation of input files:

cat <<EOF >TS.TST_X.2000-11-22 
1  2               
2  5              
3  2             
4  4          
5  8       
EOF
cat <<EOF >TS.TST_Y.2000-11-22
1   3.4
2   2.4
3   1.2
4   2.4
5   6.3
EOF
cat <<EOF >TS.TST_Z.2000-11-22
1    2.1
2    4.2
3    1.0
4    3.5
5    1.8
EOF

cat <<EOF >TS.TST_X.1990-11-22 
1  2               
2  5              
3  2             
4  4          
5  8       
EOF
cat <<EOF >TS.TST_Y.1990-11-22
1   3.4
2   2.4
3   1.2
4   2.4
5   6.3
EOF
cat <<EOF >TS.TST_Z.1990-11-22
1    2.1
2    4.2
3    1.0
4    3.5
5    1.8
EOF

When run with the following script on repl:

# get the filenames
find . -maxdepth 1 -name "TS.TST*" -printf "%f\n" |
# meh, sort them, so it looks nice
sort |
# group files according to suffix after the dot
awk -F. '
    { a[$3]=a[$3]" "$0 }
    END{ for (i in a) print i, a[i] }
' |
# here we have: YYYY-MM-DD  filename1 filename2 filename3
# let's transform it into TSTSTYYYYMMDD filename{1,2,3}
sed -E 's/^([0-9]{4})-([0-9]{2})-([0-9]{2})/TSTST\1\2\3/' |
while IFS=' ' read -r new f1 f2 f3; do
    # get second column from all files
    # if your awk doesn't sort files, they would have to be sorted here
    paste "$f1" "$f2" "$f3" | awk '{print $2,$4,$6}' > "$new"
done

# just output
for i in TSTST*; do echo "$i"; cat "$i"; done

Generates the following output:

TSTST19901122
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8
TSTST20001122
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8

I would advise to do research on basic shell commands. Read documentation about find. Read an introduction into awk and sed scripting. Read a good introduction into bash, get to know how to iterate, sort, merge and filter list of files in bash. And also read how to read a stream line by line.

Upvotes: 1

Related Questions