Reputation: 27
I am trying to map columns values in two .csv files. Its a high dimensional data file but I am only concerned about mapping and adding a new field in the second .csv file.
My File1 .csv files have labels: #Sender ID, #Type; My File2 .csv files have labels: #Sender ID, #Label;
What I am trying to do is if #sender ID(File1) = sender ID(File2), then add a new column in file 2, defining the type. Because sender ID is randomized and repetitive, it's not possible to simply copy paste the #type->#Label column in file 2.
I looked into "awk" command line but not sure how to proceed with it and also if that's will help to resolve the issue or not. Please suggest.
File1.csv(Input)
#Sender ID,#Recieve ID, #Noise, #Position, #Type, #velocity
23,value,value,value, 1, value
43,value,value,value, 2, value
12,value,value,value, 1, value
78,value,value,value, 0, value
43,value,value,value, 2, value
File2.csv(Input)
#Sender ID,#Acc,#Angle,#Label
23,value, value, NaN
43,value,value, NaN
12,value,value, NaN
78, value,value,NaN
43, value,value, NaN
File2.csv(Expected Output)
#Sender ID,#Acc,#Angle,#Label
23,value, value, 1
43,value, value, 2
12,value, value, 1
78,value, value, 0
43,value, value, 2
Original File 1 Headers:
sender, attackerType, messageID
Original File 2 Headers:
type, rcvTime, sendTime, sender, messageID, pos__001, pos__002, pos__003, pos_noise__001, pos_noise__002, pos_noise__003, noise__001, noise__002, noise__003, spd__001, spd__002, spd__003, spd_noise__001, spd_noise__002, spd_noise__003, RSSI, Label
Expected File 2 Headers should match the #sender with File 1 and then update the #Label field (label attribute is originally empty) with the corresponding #attackerType.
Here what I did:`awk 'BEGIN {FS=OFS=","}
NR==FNR {a[$1]=$2; next} {$4=a[$22]}1' File1.csv File2.csv`
Remember the number of rows is not same for both the files.
Upvotes: 1
Views: 1166
Reputation: 67467
awk
to the rescue!
$ awk 'BEGIN {FS=OFS=","}
NR==FNR {a[$1]=$5; next}
FNR>1 {$4=a[$1]}1' file1 file2
if your delimiters have uneven spaces around you may need to play with the delimiters.
If there are extra keys in file2, the value NAN
will be removed. To prevent that you may want to add && ($1 in a)
to FNR>1
condition.
Upvotes: 2