Percentage of matching lines across multiple files

Question

I am in the need of finding common lines across multiple files; more than 100 files with millions of lines each. Similar to this: Shell: Find Matching Lines Across Many Files.

However, I would like to find not only shared lines across all files but also those lines that are found in all files except one, all files except two and so on. I am interested in using percentages to do so. For example, which entries show up in 90% of the files, 80%, 70% and so on. As an example:

File1

lineA
lineB
lineC

File2

lineB
lineC
lineD

File3

lineC
lineE
lineF

Hypothetical output for the sake of demonstration:

is found in 3 out of 3 files (100.00%)

is found in 2 out of 3 files (66.67%)

is found in 1 out of 3 files (33.33%)

Does anyone know how to do it?

Thank you very much!

Percentage of matching lines across multiple files

Answers (1)

Related Questions