Reputation: 3791

Copy files containing all lines of an input file

I want to copy files in a directory which contain all the lines of an inputFile. Here is an example:

inputFile

Line3
Line1
LineX
Line4
LineB

file1

Line1
Line2
LineX
LineB

file2

Line100
Line10
LineB
Line4
LineX
Line3
Line1
Line4
Line1

The script is expected to copy only file2 to a destination directory since all lines of the inputFile are found in file2 but not in file1.

I could compare individual file with inputFile as discussed partly here and copy files manually if script produced no output. That is;

awk 'NR==FNR{a[$0];next}!($0 in a)' file1 inputFile
Line3
Line4
awk 'NR==FNR{a[$0];next}!($0 in a)' file2 inputFile

warranting no need to copy file1; however, replacing file2 will produce no result indicating all lines of inputFile are found in file2; so do a cp file2 ../distDir/.

This will be time taking and hope there will be some way I could do it in a for loop. I am not particular about awk, any bash scripting tool can be used.

Thank you,

Upvotes: 1

Answers (3)

RavinderSingh13

Reputation: 133518

Could you please try following and let me know if this helps you. I have written "echo cp " val " destination_path" in system, so you could remove echo from it and put destination_path's actual value too once you are happy with echo result(which will simply print eg--> cp file2 destination_path)

awk 'function check(array,val,count){
        if(length(array)==count){
           system("echo cp " val " destination_path")
}
}
FNR==NR{
  a[$0];
  next
}
val!=FILENAME{
  check(a,val,count)
}
FNR==1{
  val=FILENAME;
  count=total="";
  delete b
}
($1 in a) && !b[$1]++{
  count++
}
END{
  check(a,val,count)
}
' Input_file file1  file2

Will add explanation shortly too.

EDIT1: As per OP file named which should be compared by Input_file could be anything so changed code as per that request.

find -type f -exec awk 'function check(array,val,count){
        if(length(array)==count){
           system("echo cp " val " destination_path")
}
}
FNR==NR{
  a[$0];
  next
}
val!=FILENAME{
  check(a,val,count)
}
FNR==1{
  val=FILENAME;
  count=total="";
  delete b
}
($1 in a) && !b[$1]++{
  count++
}
END{
  check(a,val,count)
}
' Input_file {} +

Explanation: Adding explanation too as follows.

find -type f -iname "file*" -exec awk 'function check(array,val,count){ ##Using find command to get only the files in a directory, using exec passing their values to awk too.From here awk code starts, creating a function named check here, which will have parameters array,val and count to be passed into it, whenever a call is being made to it.
        if(length(array)==count){                    ##Checking here if length of array is equal to variable count, if yes then do following action.
           system("echo cp " val " destination_path")##Using awks system function here by which we could execute shell commands in awk script, so I have written here echo to only check purposes initially, it will print copy command if any files al lines are matching to Input_file file, if OP is happy with it OP should remove echo then.
}
}
FNR==NR{                                             ##FNR==NR condition will be only TRUE when very first file named Input_file is being read.
  a[$0];                                             ##creating an array named a whose index is current line.
  next                                               ##using next keyword will skip all further statements.
}
val!=FILENAME{                                       ##checking here when variable val is not having same value as current file name then perform following actions.
  check(a,val,count)                                 ##calling check function with passing arguments of array a,val,count.
}
FNR==1{                                              ##Checking if FNR==1, which will be true whenever a new files first line is being read.
  val=FILENAME;                                      ##creating variable named val whose value is current Input_file filename.
  count=total="";                                    ##Nullifying variables count and total now.
  delete b                                           ##Deleting array b here.
}
($1 in a) && !b[$1]++{                               ##Checking if first field of file is in array a and it is not present more than 1 time in array b then do following
  count++                                            ##incrementing variable named count value to 1 each time cursor comes inside here.
}
END{                                                 ##starting awk END block here.
  check(a,val,count)                                 ##Calling function named check with arguments array a,val and count in it.
}
' Input_file {} +                                    ##Mentioning Input_file here

PS: I tested/written this in GNU awk.

Upvotes: 0

RomanPerekhrest

Reputation: 92854

bash (with comm + wc commands) solution:

#!/bin/bash

n=$(wc -l inputFile | cut -d' ' -f1)   # number of lines of inputFile
for f in /yourdir/file*
do
    if [[ $n == $(comm -12 <(sort inputFile) <(sort "$f") | wc -l | cut -d' ' -f1) ]]
    then 
        cp "$f" "/dest/${f##*/}" 
    fi
done

comm -12 FILE1 FILE2 - output only lines that appear in both files

Upvotes: 1

Cristian Ramon-Cortes

Reputation: 1888

Assuming the following:

All the files you need to check are in the current directory
The base file is also in the current directory and named inputFile
The target path is ../distDir/

You may run a BASH script like the following which basically loops over all the files, compares them against the base file and copies them if required.

#!/bin/bash

inputFile="./inputFile"
targetDir="../distDir/"
for file in *; do
  dif=$(awk 'NR==FNR{a[$0];next}!($0 in a)' $file $inputFile)
  if [ "$dif" == "" ]; then
    # File contains all lines, copy
    cp $file $targetDir
  fi
done

Upvotes: 2

Copy files containing all lines of an input file

Answers (3)

Related Questions