RCarmody
RCarmody

Reputation: 720

Power BI: Percent Overlap Between Groups

I have a dataset in which I have two columns: Class & Student. I am trying to find the percent overlap between classes (e.g. what percent of students in Class A are also enrolled in Class B)?

I'd assume that this should be done in PowerQuery by expanding the table over itself to compare each class and then maybe a groupby to show the percent overlap? Any ideas?

Here's some sample data and expected output:

enter image description here

Desired output (random generated percentages):

enter image description here

Upvotes: 2

Views: 710

Answers (2)

Aleksei Zhigulin
Aleksei Zhigulin

Reputation: 1634

Try this code:

let
    Source = Table.Buffer(Excel.Workbook(File.Contents("C:\Path\YourWorkbook.xlsx"))
                                         {[Item="YourTable",Kind="Table"]}[Data]),
    group = Table.Group(Source, {"Class"}, {"CompareClass", each List.Distinct(Source[Class])}),
    expand = Table.ExpandListColumn(group, "CompareClass"),
    filter = Table.SelectRows(expand, each [Class] <> [CompareClass]),
    add = Table.AddColumn(filter, "Overlap", each let
                a = Table.RowCount(Table.SelectRows(Table.Group(Source, {"Student"},
                {"c", each [Class]}), (x)=> List.ContainsAll(x[c],{[Class], [CompareClass]}))),
                b = Table.RowCount(Table.SelectRows(Source, (x)=>x[Class] = [CompareClass]))
                in a/b, Percentage.Type)
in
    add

enter image description here

Upvotes: 1

mkRabbani
mkRabbani

Reputation: 16908

You can serve your purpose in power query by creating some additional tables as stated below-

Let, your source table name: Course Column Names in your table: class, student

Now, go to power query editor and do these following steps-

Step-1: Duplicate your source table "Course" into 3 new table and name them as-

"course_1"
"course_2"
"course_3"

Step-2: Edit "course_1" advance query code with this below code-

let
    Source = Excel.Workbook(File.Contents("D:\WORK\R&D\Book2.xlsx"), null, true),
    course_percentage_Sheet = Source{[Item="course_percentage",Kind="Sheet"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(course_percentage_Sheet,{{"Column1", type text}, {"Column2", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"class", type text}, {"student", type text}}),

    //-------------------------------
    //-- This below part is new code
    //-------------------------------
    #"Grouped Rows" = Table.Group(#"Changed Type1", {"class"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
    #"Grouped Rows" 

Step-3: Edit "course_2" advance query code with this below code-

let
    Source = Excel.Workbook(File.Contents("D:\WORK\R&D\Book2.xlsx"), null, true),
    course_percentage_Sheet = Source{[Item="course_percentage",Kind="Sheet"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(course_percentage_Sheet,{{"Column1", type text}, {"Column2", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"class", type text}, {"student", type text}}),

    //-------------------------------
    //-- This below part are new code
    //-------------------------------
    #"Merged Queries" = Table.NestedJoin(#"Changed Type1", {"student"}, #"Changed Type1", {"student"}, "Changed Type1", JoinKind.LeftOuter),
    #"Expanded Changed Type1" = Table.ExpandTableColumn(#"Merged Queries", "Changed Type1", {"class", "student"}, {"Changed Type1.class", "Changed Type1.student"}),
    #"Grouped Rows" = Table.Group(#"Expanded Changed Type1", {"class", "Changed Type1.class"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
    #"Grouped Rows"

Step-4: Edit "course_3" advance query code with this below code-

let
    Source = Excel.Workbook(File.Contents("D:\WORK\R&D\Book2.xlsx"), null, true),
    course_percentage_Sheet = Source{[Item="course_percentage",Kind="Sheet"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(course_percentage_Sheet,{{"Column1", type text}, {"Column2", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"class", type text}, {"student", type text}}),

    //-------------------------------
    //-- This below part are new code
    //-------------------------------
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"student"}),
    #"Removed Duplicates" = Table.Distinct(#"Removed Columns"),
    #"Added Custom" = Table.AddColumn(#"Removed Duplicates", "Custom", each 1),
    #"Merged Queries" = Table.NestedJoin(#"Added Custom", {"Custom"}, #"Added Custom", {"Custom"}, "Added Custom", JoinKind.FullOuter),
    #"Expanded Added Custom" = Table.ExpandTableColumn(#"Merged Queries", "Added Custom", {"class", "Custom"}, {"Added Custom.class", "Added Custom.Custom"}),
    #"Sorted Rows" = Table.Sort(#"Expanded Added Custom",{{"class", Order.Ascending}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Sorted Rows",{"Custom", "Added Custom.Custom"}),
    #"Merged Queries1" = Table.NestedJoin(#"Removed Columns1", {"class"}, course_1, {"class"}, "course_1", JoinKind.LeftOuter),
    #"Expanded course_1" = Table.ExpandTableColumn(#"Merged Queries1", "course_1", {"Count"}, {"course_1.Count"}),
    #"Merged Queries2" = Table.NestedJoin(#"Expanded course_1", {"class", "Added Custom.class"}, course_2, {"class", "Changed Type1.class"}, "course_2", JoinKind.LeftOuter),
    #"Expanded course_2" = Table.ExpandTableColumn(#"Merged Queries2", "course_2", {"Count"}, {"course_2.Count"}),
    #"Replaced Value" = Table.ReplaceValue(#"Expanded course_2",null,0,Replacer.ReplaceValue,{"course_2.Count"}),
    #"Renamed Columns" = Table.RenameColumns(#"Replaced Value",{{"Added Custom.class", "class_compare"}, {"course_1.Count", "total_student"}, {"course_2.Count", "matched_student"}}),
    #"Added Custom1" = Table.AddColumn(#"Renamed Columns", "percentage_matched", each [matched_student]/[total_student]),
    #"Changed Type2" = Table.TransformColumnTypes(#"Added Custom1",{{"percentage_matched", Percentage.Type}})
in
    #"Changed Type2"

Step-5: Disable table load for "course_1" and "course_2" if you wants by right click on the table and disable tick from besides "Enable Load".

Step-6: get back to report by clicking "Close & Apply" button.

Step-7: Change data type of "percentage_matched" column in table "course_3" as percentage.

Step-8: Add three column - class, class_compare and "percentage_matched" to your table visual. The final output should be something like below image-

enter image description here

Upvotes: 0

Related Questions