Modify first column on my data file

I am trying to modify the first column on a xyz file. I tried with awk and it worked, but it erased some lines in the new file that I created with the modified data. Here is the head of the original data.

1500    
Atoms. Timestep: 0    
1 6.3115 6.3115 6.36745    
2 6.3115 6.3115 9.47036    
2 6.3115 3.15575 6.39316    
2 3.15575 6.3115 6.39316    
3 3.15575 3.15575 8.83622    
4 3.15575 3.15575 3.90335    
5 8.53643 8.92983 8.45625    
5 4.08657 8.92983 8.45625

I used this code to modify the first column with another file:

awk 'NR==FNR{a[$1]=$2;next} {$1=a[$1]}1' reemp.txt traj300.xyz > tra300.xyz

But now the new file looks like this

Timestep: 0    
Pb 6.3115 6.3115 6.36745    
I 6.3115 6.3115 9.47036    
I 6.3115 3.15575 6.39316    
I 3.15575 6.3115 6.39316    
C 3.15575 3.15575 8.83622    
N 3.15575 3.15575 3.90335    
Hc 8.53643 8.92983 8.45625    
Hc 4.08657 8.92983 8.45625

The modification was good, but it erased the first and part of the second line. The problem is that the list has 75 million lines with different timesteps and configurations, and the code erased the same thing in every configuration.

Upvotes: 1

Views: 50

Answers (2)

Marc Lambrichs
Marc Lambrichs

Reputation: 2892

There are 2 things wrong in your script. I've put your original data in a file input.txt and try to check out what, in the first part of your awk code, you put into array a.

$ awk '{a[$1]=$2;next} END {for (i in a) print i"\t-> "a[i]}' input.txt
Atoms.  -> Timestep:         # <- 
1   -> 6.3115
2   -> 3.15575
3   -> 3.15575
4   -> 3.15575
5   -> 4.08657
1500    ->                   # <-

I'm sure these are not all the values you want to be in a. Now, let's take a look at the second part of your code:

{$1=a[$1]}

This will replace every first column on every line with column 2 from your first file, if your current column 1 $1 can be found in a (having my doubts if you really want this. Isn't it column 2 in file2 you want to replace?). We don't know what your second input file looks like, but what we do know is that:

  • whenever first column = "Atoms." it will be replaced by "Timestep:"
  • whenever first column = 1500 it will be replace by ""

Because you didn't provide us a second input file, we're not sure what your first 2 lines look like. But, to give you an example, let's feed it the same input file again:

$ awk 'NR==FNR{a[$1]=$2;next} {$1=a[$1]}1' input.txt input.txt

Timestep: Timestep: 0
6.3115 6.3115 6.3115 6.36745
3.15575 6.3115 6.3115 9.47036
3.15575 6.3115 3.15575 6.39316
3.15575 3.15575 6.3115 6.39316
3.15575 3.15575 3.15575 8.83622
3.15575 3.15575 3.15575 3.90335
4.08657 8.53643 8.92983 8.45625
4.08657 4.08657 8.92983 8.45625

This is probably not what you want to do. I suppose you need a selection of lines where you really want the first column to be put into array a. The second thing that's wrong with your awk is that it changes every column 1 in the second file. Here's where you need to check whether $1 is already in array a, and then, and only then replace it.

So, maybe something like this?

$ awk 'NR==FNR {if ($1~/[0-9]+/ && $2~/[0-9]+\.[0-9]+/) a[$1]=$2;next} \\ 
($1 in a){$1=a[$1]}1' input.txt input.txt
1500
Atoms. Timestep: 0
6.3115 6.3115 6.3115 6.36745
3.15575 6.3115 6.3115 9.47036
3.15575 6.3115 3.15575 6.39316
3.15575 3.15575 6.3115 6.39316
3.15575 3.15575 3.15575 8.83622
3.15575 3.15575 3.15575 3.90335
4.08657 8.53643 8.92983 8.45625
4.08657 4.08657 8.92983 8.45625

explanation:

NR==FNR {                       # only for lines from the first input file
   if ($1~/[0-9]+/ &&           # if column 1 is a number AND  
       $2~/[0-9]+\.[0-9]+/)     # column 2 is a fractional number
      a[$1]=$2;                 # save column 2 in array a with index $1
   next                         # skip to next, for every line in file 1
}
($1 in a){                      # if column 1 exists in array a
  $1=a[$1]                      # replace column 1 by corresponding column 2 from file 1
}
1                               # print line

Upvotes: 0

Walter A
Walter A

Reputation: 20022

Your problem is that not all fields in traj300.xyz can be found in reemp.txt. Using your head of your input.data I can reproduce your problem with the following reemp.txt:

1 Pb
2 I
3 C
4 N
5 H

The first field should only be replaced when that field is found in the array. You must add a check in your awk:

awk 'NR==FNR{a[$1]=$2;next} $1 in a {$1=a[$1]}1' reempt.txt traj300.xyz

Upvotes: 1

Related Questions