MonkeyMonkey
MonkeyMonkey

Reputation: 160

Excel to SQL and Handling Duplicate Values

Lookup Table in Worksheet 1.

ID      CODE  REASSIGN(CODE)
W12-05  AA    ZZ
W12-05  BB    ZZ
W14-01  CC    CC
W14-03  AA    AA
W15-01  DD    DD
W15-01  DD    DD

ID and CODE are generated from the report

REASSIGN(CODE) is what I am trying to accomplish. If there are multiple IDs (ex. W12-05) that are the same with different CODE values, then assign it to "ZZ" . However, if there are multiple IDs and they all have the same CODE values (ex. W15-01), then assign it to the same code value.

There are 2 Worksheets (Worksheet 2 and Worksheet 3) that house the ID and different data associated with that ID.

Data Table:

ID      LOOKUP REASSIGN(CODE)
W12-05  ZZ
W14-01  CC
W14-03  AA
W15-01  DD

How can I accomplish the REASSIGN(CODE) column in the first table? Something along the lines of:

If IDs duplicate and multiple different CODEs, assign to ZZ. If IDs duplicate and same codes, assign to same CODE.

There might be instances where if there certain multiple codes for duplicate IDs, it will need to assign to a specific CODE. "ZZ" is the default when I don't know where it should go and it can be handled by another department. Sometimes I know that code "EE" and code "FF" belong to one department and there is logic to assign it to "GG" for example.

I want to combine all 3 Worksheets based on the ID column and I want all the data from Worksheet 2 and Worksheet 3 to be there, even if there isn't a match in IDs. Worksheet 1 will be used just as a lookup to pull in the REASSIGN(CODE).

Please let me know if I need to explain further or if there are any questions. I am open to ideas and other solutions, since this is just the logic that came into my head. I am not sure if it's better to create another table for the REASSIGN(CODE) or if my logic is sound.

Original Question - RESOLVED

Combination of IF, REPLACE, and & - I am happy to provide more details.

I have 3 workbooks I am pulling data from and I do perform some data manipulation.

Here are the formulas I currently have in my spreadsheet:

Column: OB

Formula: =SUBSTITUTE(SUBSTITUTE([@[OBN]]," ",""),"-","")

This cleans up the OBN column removing all hyphens and spaces to standardize it based on the data table.

Column: DO

Formula:

=IF(OR([@[DON]]="NA",ISBLANK(TRIM([@[DON]])),[@[DON]]="N/A",ISNA([@[DON]]),ISBLANK([@[DON]])),"",[@[DON]])

This makes the DO column blank if there is no value there or if it's NA or N/A, as that is how the report displays it. Otherwise, use the value of DON as DO.

Column: OB+DO (This is the ID in the tables below)

Formula:

=IF([@DO]="",[@OB],CONCATENATE([@OB],"-",[@DO]))

Concatenate columns OB and DO with a hyphen if DO contains a value that is not NA or N/A. Otherwise, just use OB.

Upvotes: 1

Views: 260

Answers (1)

MonkeyMonkey
MonkeyMonkey

Reputation: 160

I had to move to SQL since Excel was not handling the data very well. I have it fully functioning now in SQL. If anyone is curious on the solution, I have posted it below.

SELECT [Table1].[ID], [Table1].[CODE], Count([Table1].[CODE]) AS [CODE Count] FROM [Table1] GROUP BY [Table1].[ID], [Table1].[CODE];

I first do a count of CODE and group by ID and CODE. This gets me the count of each CODE per ID. For my example above, W12-05 would be listed twice and would have a count of 1 and 1, results in the table below.

Table2

ID  CODE    CODE Count
W12-05  AA  1
W12-05  BB  1
W14-01  CC  1
W14-03  AA  1
W15-01  DD  2

Then, I do a count of ID's and grouped by ID.

SELECT [Table1].[ID], Count([Table1].[ID]) AS [ID Count]
FROM [Table1]
GROUP BY [Table1].[ID];

Table3

ID  ID Count
W12-05  2
W14-01  1
W14-03  1
W15-01  2

This will get me the total count of each ID.

Next, I compare the ID count and CODE Count for each ID.

SELECT [Table2].[ID], [Table2].[CODE], [Table2].[CODE Count], [Table3].[Count ID]
FROM [Table2] LEFT JOIN [Table3] ON [Table2].[ID] = [Table3].[ID];

This yields the following table.

Table4

ID     CODE CODE Count  ID Count
W12-05  AA       1          2
W12-05  BB       1          2
W14-01  CC       1          1
W14-03  AA       1          1
W15-01  DD       2          2

The last step I do is compare the count of CODE and ID.

SELECT [Table4].[ID], [Table4].[CODE Count], [Table4].[Count ID], IIf([Table4].[CODE Count]<>[Table4].[Count ID],"ZZ", [Table4].CODE) AS CODE
FROM [Table4];

If the CODE Count and ID Count do not match, assign it to "ZZ".

Table5

ID    CODE Count  ID Count  CODE
W12-05    1            2     ZZ
W12-05    1            2     ZZ
W14-01    1            1     CC
W14-03    1            1     AA
W15-01    2            2     DD

From here, you can just take the DISTINCT values.

SELECT DISTINCT [Table5].[ID], [Table5].CODE
FROM [Table5];

ID     CODE
W12-05  ZZ
W14-01  CC
W14-03  AA
W15-01  DD

I am not sure if this is the optimal way, but it is a working solution for me. I hope this helps!

Upvotes: 2

Related Questions