Reputation: 67
I've got a csv extract of data from a system in this kind of format (but with 16 question, answer and score groups - not just 4):
ClientID | MemberID | DateCompleted | Overall Score | Category1 Summary | Category2 Summary | Category3 Summary | Question1 | Answer1 | Question2 | Answer2 | Score 2 | Question3 | Answer3 | Score 3 | Question4 | Answer4 | Score 4 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Client1 | [email protected] | 01/01/2023 12:00:00 | 100 | 100 | 100 | 100 | How Old Are You? | 25 | What's your favourite colour? | Blue | 5 | How long have you owned your house? | 4 Years | 4 | Can you Sail? | Yes | 1 |
Client1 | [email protected] | 01/02/2023 12:00:00 | 36 | 61 | 20 | 27 | How Old Are You? | 30 | What's your favourite colour? | Green | 4 | How long have you owned your house? | 10 Years | 6 | Can you Sail? | No | 0 |
Client2 | [email protected] | 01/01/2023 12:00:00 | 63 | 83 | 60 | 47 | How Old Are You? | 50 | What's your favourite colour? | Yellow | 3 | How long have you owned your house? | Over 20 Years | 10 | Can you Sail? | Yes | 1 |
Client2 | [email protected] | 01/02/2023 13:00:00 | 62 | 80 | 60 | 47 | How Old Are You? | 50 | What's your favourite colour? | Yellow | 3 | How long have you owned your house? | Over 20 Years | 10 | Can you Sail? | Yes | 1 |
It's questionnaire data and I haven't got any say over the format. One quirk is that MemberID is the unique identifier for a person, and a person can complete the questionnaire multiple times (so MemberID and DateCompleted make a row unique). That also means that a unique member could have a different age recorded on different rows if they've done the questionnaire years apart. Question 1 will always be 'how old are you', Question 2 will always be 'What's your favourite colour' etc.
So. I've been asked to get it into power bi and report on it, including Age ranges (e.g. less than 30 / 30-49 / over 50) I've got a solution using power query that's got it into a reportable model (outlined below), but it feels a bit clunky and I'm sure there's a neater / more efficient way of doing it.
I want to transform it into this model:
with the tables looking like this:
Questions:
QuestionID | Question |
---|---|
1 | How old are you? |
2 | What's your favourite colour? |
3 | How long have you owned your house? |
4 | Can you Sail? |
Member:
ClientID | MemberID | Age | AgeRange1 | AgeRangeOrder1 | AgeRange2 | AgeRangeOrder2 |
---|---|---|---|---|---|---|
Client1 | [email protected] | 25 | Under 30 | 1 | 20-30 | 1 |
Client1 | [email protected] | 30 | 30-50 | 2 | 30-40 | 2 |
Client2 | [email protected] | 50 | over 50 | 3 | 50-60 | 4 |
Client2 | [email protected] | 50 | over 50 | 3 | 50-60 | 4 |
AnswerHistory:
MemberID | MemberDatekey | DateCompleted | QuestionID | Answer | Score |
---|---|---|---|---|---|
[email protected] | [email protected]/01/23 12:00:00 | 01/01/2023 12:00 | 1 | 25 | 0 |
[email protected] | [email protected]/01/23 12:00:00 | 01/01/2023 12:00 | 2 | Blue | 5 |
[email protected] | [email protected]/01/23 12:00:00 | 01/01/2023 12:00 | 3 | 4 Years | 4 |
[email protected] | [email protected]/01/23 12:00:00 | 01/01/2023 12:00 | 4 | Yes | 1 |
[email protected] | [email protected]/02/23 12:00:00 | 01/02/2023 12:00 | 1 | 30 | 0 |
[email protected] | [email protected]/02/23 12:00:00 | 01/02/2023 12:00 | 2 | Green | 4 |
[email protected] | [email protected]/02/23 12:00:00 | 01/02/2023 12:00 | 3 | 10 Years | 6 |
[email protected] | [email protected]/02/23 12:00:00 | 01/02/2023 12:00 | 4 | No | 0 |
[email protected] | [email protected]/01/23 12:00:00 | 01/01/2023 12:00 | 1 | 50 | 0 |
[email protected] | [email protected]/01/23 12:00:00 | 01/01/2023 12:00 | 2 | Yellow | 3 |
[email protected] | [email protected]/01/23 12:00:00 | 01/01/2023 12:00 | 3 | Over 20 Years | 10 |
[email protected] | [email protected]/01/23 12:00:00 | 1/01/2023 12:00 | 4 | No | 1 |
[email protected] | [email protected]/02/23 13:00:00 | 01/02/2023 13:00 | 1 | 50 | 0 |
[email protected] | [email protected]/02/23 13:00:00 | 01/02/2023 13:00 | 2 | Yellow | 3 |
[email protected] | [email protected]/02/23 13:00:00 | 01/02/2023 13:00 | 3 | Over 20 Years | 10 |
[email protected] | [email protected]/02/23 13:00:00 | 01/02/2023 13:00 | 4 | Yes | 1 |
MemberSummaries:
memberid | memberdatekey | datecompleted | overall score | Category1 Summary | Category2 Summary | Category3 Summary |
---|---|---|---|---|---|---|
[email protected] | [email protected]/01/23 12:00:00 | 01/01/2023 | 100 | 100 | 100 | 100 |
[email protected] | [email protected]/02/23 12:00:00 | 01/02/2023 | 36 | 61 | 20 | 27 |
[email protected] | [email protected]/01/23 12:00:00 | 01/01/2023 | 63 | 83 | 60 | 47 |
[email protected] | [email protected]/02/23 13:00:00 | 01/02/2023 | 62 | 80 | 60 | 47 |
And the steps I took to get there in power query are:
Create a new 'questions' table with a 'QuestionID' and a Question. Hardcoded this in from the questions but would like to make this dynamic.
Duplicate the Main query, remove all the columns except memberid, date and Question1 / Answer 1
Insert a 'QuestionID column' and fill it with "1"
Repeat steps 2 & 3 for the other 15 questions so I end up with 16 queries (Q1 - Q16) with memberid, date, questionid, answer & score.
Append Q1 - Q16 into a new query - "AnswerHistory"
Merge memberid & date to get a unique key on AnswerHistory
Duplicate the main query, remove all the columns except member identifiers, date and the the summary scores - new Query -"MemberSummaries"
Merge memberID & date to get a unique key on MemberSummaries
Duplicate the main query, remove everything except the member identifiers, client and age.
Group to remove duplicates - new Query - "Member"
make some new columns to split by various age groups **
**I just realised I probably need the age stuff on MemberSummaries because it could change over time & if it does then I won't get unique members.
So this looks like a lot of steps
Having loads of queries to split up the questions & re-merge them into something sensible (to me) seems like overkill.
Is there a way to pivot the original data into the "AnswerHistory" format in one query rather than in 16 separate ones?
Cheers
Upvotes: 0
Views: 919
Reputation: 60344
Edited to include the custom function
You can generate the "AnswerHistory" table from your original table with just a single query.
Algorithm should be explained by the notes in the query
let
//read in the initial data table
Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],
//Remove unneeded columns
#"Cols to Remove" = List.Select(Table.ColumnNames(Source), each _ = "Overall Score" or _ = "ClientID" or Text.EndsWith(_,"Summary")),
#"Remove Cols" = Table.RemoveColumns(Source,#"Cols to Remove"),
//Add the memberdatekey column
// This is not needed to develop this particular table
//and set it into the proper position
#"Add MemberDatekey" = Table.AddColumn(#"Remove Cols", "MemberDatekey",
each [MemberID] & DateTime.ToText([DateCompleted],"MM/dd/yyyy" & "#(lf)" & "hh:mm"), type text),
#"Reordered Columns" = Table.ReorderColumns(#"Add MemberDatekey", {"MemberID", "MemberDatekey"} & List.RemoveFirstN(Table.ColumnNames(#"Remove Cols"),1)),
//Unpivot columns except for the first three
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Reordered Columns", {"MemberID", "MemberDatekey", "DateCompleted"}, "Attribute", "Value"),
//Split column contents so as to create unique values to pivot on
//Note renaming of the split columns in the function from the default
#"Split Column by Character Transition" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute",
Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), {"Attribute", "QuestionID"}),
//Using a custom function, Pivot the "Attribute" column
// custom function needed because the basic Pivot with no aggregation function will return an error.
// This can also be done using native functions, if the custom function is too slow
#"Pivot No Aggregation" = fnPivotNoAggregation(#"Split Column by Character Transition","Attribute","Value"),
//Remove the Question text column as it is no longer needed
#"Removed Columns" = Table.RemoveColumns(#"Pivot No Aggregation",{"Question"}),
//Set the data types for the columns
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"MemberID", type text}, {"MemberDatekey", type text}, {"DateCompleted", type datetime}, {"QuestionID", Int64.Type}, {"Answer", type any}, {"Score ", Int64.Type}}),
//Replace the null score generated for "age" by 0 as per your spec
#"Replaced Value" = Table.ReplaceValue(#"Changed Type",null,0,Replacer.ReplaceValue,{"Score "})
in
#"Replaced Value"
Custom Function
Enter as Blank Query and Rename: fnPivotNoAggregation
//credit: Cam Wallace https://www.dingbatdata.com/2018/03/08/non-aggregate-pivot-with-multiple-rows-in-powerquery/
//Rename: fnPivotNoAggregation
(Source as table,
ColToPivot as text,
ColForValues as text)=>
let
PivotColNames = List.Buffer(List.Distinct(Table.Column(Source,ColToPivot))),
#"Pivoted Column" = Table.Pivot(Source, PivotColNames, ColToPivot, ColForValues, each _),
TableFromRecordOfLists = (rec as record, fieldnames as list) =>
let
PartialRecord = Record.SelectFields(rec,fieldnames),
RecordToList = Record.ToList(PartialRecord),
Table = Table.FromColumns(RecordToList,fieldnames)
in
Table,
#"Added Custom" = Table.AddColumn(#"Pivoted Column", "Values", each TableFromRecordOfLists(_,PivotColNames)),
#"Removed Other Columns" = Table.RemoveColumns(#"Added Custom",PivotColNames),
#"Expanded Values" = Table.ExpandTableColumn(#"Removed Other Columns", "Values", PivotColNames)
in
#"Expanded Values"
If you desire the clients to come out in a specific order, some sorting steps may need to be added
Upvotes: 0