Ashley
Ashley

Reputation: 27

Script to map columns in two csv files

I am trying to map columns values in two .csv files. Its a high dimensional data file but I am only concerned about mapping and adding a new field in the second .csv file.

My File1 .csv files have labels: #Sender ID, #Type; My File2 .csv files have labels: #Sender ID, #Label;

What I am trying to do is if #sender ID(File1) = sender ID(File2), then add a new column in file 2, defining the type. Because sender ID is randomized and repetitive, it's not possible to simply copy paste the #type->#Label column in file 2.

I looked into "awk" command line but not sure how to proceed with it and also if that's will help to resolve the issue or not. Please suggest.

File1.csv(Input) #Sender ID,#Recieve ID, #Noise, #Position, #Type, #velocity 23,value,value,value, 1, value 43,value,value,value, 2, value 12,value,value,value, 1, value 78,value,value,value, 0, value 43,value,value,value, 2, value

File2.csv(Input) #Sender ID,#Acc,#Angle,#Label 23,value, value, NaN 43,value,value, NaN 12,value,value, NaN 78, value,value,NaN 43, value,value, NaN

File2.csv(Expected Output) #Sender ID,#Acc,#Angle,#Label 23,value, value, 1 43,value, value, 2 12,value, value, 1 78,value, value, 0 43,value, value, 2

Original File 1 Headers: sender, attackerType, messageID

Original File 2 Headers: type, rcvTime, sendTime, sender, messageID, pos__001, pos__002, pos__003, pos_noise__001, pos_noise__002, pos_noise__003, noise__001, noise__002, noise__003, spd__001, spd__002, spd__003, spd_noise__001, spd_noise__002, spd_noise__003, RSSI, Label

Expected File 2 Headers should match the #sender with File 1 and then update the #Label field (label attribute is originally empty) with the corresponding #attackerType.

Here what I did:`awk 'BEGIN {FS=OFS=","}

NR==FNR {a[$1]=$2; next} {$4=a[$22]}1' File1.csv File2.csv`

Remember the number of rows is not same for both the files.

Upvotes: 1

Views: 1166

Answers (1)

karakfa
karakfa

Reputation: 67467

awk to the rescue!

$ awk 'BEGIN   {FS=OFS=","} 
       NR==FNR {a[$1]=$5; next} 
       FNR>1   {$4=a[$1]}1' file1 file2

if your delimiters have uneven spaces around you may need to play with the delimiters.

If there are extra keys in file2, the value NAN will be removed. To prevent that you may want to add && ($1 in a) to FNR>1 condition.

Upvotes: 2

Related Questions