Reputation: 720
I have a dataset in which I have two columns: Class & Student. I am trying to find the percent overlap between classes (e.g. what percent of students in Class A are also enrolled in Class B)?
I'd assume that this should be done in PowerQuery by expanding the table over itself to compare each class and then maybe a groupby to show the percent overlap? Any ideas?
Here's some sample data and expected output:
Desired output (random generated percentages):
Upvotes: 2
Views: 710
Reputation: 1634
Try this code:
let
Source = Table.Buffer(Excel.Workbook(File.Contents("C:\Path\YourWorkbook.xlsx"))
{[Item="YourTable",Kind="Table"]}[Data]),
group = Table.Group(Source, {"Class"}, {"CompareClass", each List.Distinct(Source[Class])}),
expand = Table.ExpandListColumn(group, "CompareClass"),
filter = Table.SelectRows(expand, each [Class] <> [CompareClass]),
add = Table.AddColumn(filter, "Overlap", each let
a = Table.RowCount(Table.SelectRows(Table.Group(Source, {"Student"},
{"c", each [Class]}), (x)=> List.ContainsAll(x[c],{[Class], [CompareClass]}))),
b = Table.RowCount(Table.SelectRows(Source, (x)=>x[Class] = [CompareClass]))
in a/b, Percentage.Type)
in
add
Upvotes: 1
Reputation: 16908
You can serve your purpose in power query by creating some additional tables as stated below-
Let, your source table name: Course Column Names in your table: class, student
Now, go to power query editor and do these following steps-
Step-1: Duplicate your source table "Course" into 3 new table and name them as-
"course_1"
"course_2"
"course_3"
Step-2: Edit "course_1" advance query code with this below code-
let
Source = Excel.Workbook(File.Contents("D:\WORK\R&D\Book2.xlsx"), null, true),
course_percentage_Sheet = Source{[Item="course_percentage",Kind="Sheet"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(course_percentage_Sheet,{{"Column1", type text}, {"Column2", type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"class", type text}, {"student", type text}}),
//-------------------------------
//-- This below part is new code
//-------------------------------
#"Grouped Rows" = Table.Group(#"Changed Type1", {"class"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
#"Grouped Rows"
Step-3: Edit "course_2" advance query code with this below code-
let
Source = Excel.Workbook(File.Contents("D:\WORK\R&D\Book2.xlsx"), null, true),
course_percentage_Sheet = Source{[Item="course_percentage",Kind="Sheet"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(course_percentage_Sheet,{{"Column1", type text}, {"Column2", type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"class", type text}, {"student", type text}}),
//-------------------------------
//-- This below part are new code
//-------------------------------
#"Merged Queries" = Table.NestedJoin(#"Changed Type1", {"student"}, #"Changed Type1", {"student"}, "Changed Type1", JoinKind.LeftOuter),
#"Expanded Changed Type1" = Table.ExpandTableColumn(#"Merged Queries", "Changed Type1", {"class", "student"}, {"Changed Type1.class", "Changed Type1.student"}),
#"Grouped Rows" = Table.Group(#"Expanded Changed Type1", {"class", "Changed Type1.class"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
#"Grouped Rows"
Step-4: Edit "course_3" advance query code with this below code-
let
Source = Excel.Workbook(File.Contents("D:\WORK\R&D\Book2.xlsx"), null, true),
course_percentage_Sheet = Source{[Item="course_percentage",Kind="Sheet"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(course_percentage_Sheet,{{"Column1", type text}, {"Column2", type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"class", type text}, {"student", type text}}),
//-------------------------------
//-- This below part are new code
//-------------------------------
#"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"student"}),
#"Removed Duplicates" = Table.Distinct(#"Removed Columns"),
#"Added Custom" = Table.AddColumn(#"Removed Duplicates", "Custom", each 1),
#"Merged Queries" = Table.NestedJoin(#"Added Custom", {"Custom"}, #"Added Custom", {"Custom"}, "Added Custom", JoinKind.FullOuter),
#"Expanded Added Custom" = Table.ExpandTableColumn(#"Merged Queries", "Added Custom", {"class", "Custom"}, {"Added Custom.class", "Added Custom.Custom"}),
#"Sorted Rows" = Table.Sort(#"Expanded Added Custom",{{"class", Order.Ascending}}),
#"Removed Columns1" = Table.RemoveColumns(#"Sorted Rows",{"Custom", "Added Custom.Custom"}),
#"Merged Queries1" = Table.NestedJoin(#"Removed Columns1", {"class"}, course_1, {"class"}, "course_1", JoinKind.LeftOuter),
#"Expanded course_1" = Table.ExpandTableColumn(#"Merged Queries1", "course_1", {"Count"}, {"course_1.Count"}),
#"Merged Queries2" = Table.NestedJoin(#"Expanded course_1", {"class", "Added Custom.class"}, course_2, {"class", "Changed Type1.class"}, "course_2", JoinKind.LeftOuter),
#"Expanded course_2" = Table.ExpandTableColumn(#"Merged Queries2", "course_2", {"Count"}, {"course_2.Count"}),
#"Replaced Value" = Table.ReplaceValue(#"Expanded course_2",null,0,Replacer.ReplaceValue,{"course_2.Count"}),
#"Renamed Columns" = Table.RenameColumns(#"Replaced Value",{{"Added Custom.class", "class_compare"}, {"course_1.Count", "total_student"}, {"course_2.Count", "matched_student"}}),
#"Added Custom1" = Table.AddColumn(#"Renamed Columns", "percentage_matched", each [matched_student]/[total_student]),
#"Changed Type2" = Table.TransformColumnTypes(#"Added Custom1",{{"percentage_matched", Percentage.Type}})
in
#"Changed Type2"
Step-5: Disable table load for "course_1" and "course_2" if you wants by right click on the table and disable tick from besides "Enable Load".
Step-6: get back to report by clicking "Close & Apply" button.
Step-7: Change data type of "percentage_matched" column in table "course_3" as percentage.
Step-8: Add three column - class, class_compare and "percentage_matched" to your table visual. The final output should be something like below image-
Upvotes: 0