Megan Critchley
Megan Critchley

Reputation: 103

tidyr Pivot-wider: Duplicate issue

I am trying to use pivot wider to reduce the number of rows in my data and add new columns. However, the number of columns increases, but the number of rows remains the same. Ideally, each 'Indicator' should be one observation, where the DataYear, Company, Market, Country etc columns are the same. I think the issue may be due to duplicate observations, but dont understand how the IndicatorID column doesnt fix this issue?

A sample of my data:

    LongTest <- structure(list(DataYear = c(2018L, 2017L, 2016L, 2018L, 2017L, 
2016L, 2018L, 2017L, 2016L), Company = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = "One", class = "factor"), Market = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Total", class = "factor"), 
    Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"), 
    ISO = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"), 
    Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Insurance", class = "factor"), 
    Division = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), Furtherdetails1 = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), Furtherdetails2 = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), Indicator = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Tax Avoidance", 
    "Turnover"), class = "factor"), IndicatorID = c(20L, 20L, 
    20L, 20L, 20L, 20L, 26L, 26L, 26L), InputName = structure(c(3L, 
    3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Number of employees", 
    "Profit before tax (Attributable to shareholder profit)", 
    "Tax Paid"), class = "factor"), InputCode = structure(c(2L, 
    2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("InputA", "InputB"
    ), class = "factor"), UnitRequired = structure(c(2L, 2L, 
    2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("#", "GBP"), class = "factor"), 
    Value = c(4.47e+08, 6.2e+08, 6.47e+08, 2.129e+09, 2.003e+09, 
    1.193e+09, 37628, 42431, 39833.44), UniqueID = 1:9), class = "data.frame", row.names = c(NA, 
-9L))

And the code I am currently using:

outTest <- pivot_wider(LongTest, names_from = InputCode, values_from = c(Value, UnitRequired, InputName))

When I use my full dataframe I get this error message:

Warning messages:
1: Values in `InputName` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(InputName = list)` to suppress this warning.
* Use `values_fn = list(InputName = length)` to identify where the duplicates arise
* Use `values_fn = list(InputName = summary_fun)` to summarise duplicates 
2: Values in `UnitRequired` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(UnitRequired = list)` to suppress this warning.
* Use `values_fn = list(UnitRequired = length)` to identify where the duplicates arise
* Use `values_fn = list(UnitRequired = summary_fun)` to summarise duplicates 
3: Values in `Value` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(Value = list)` to suppress this warning.
* Use `values_fn = list(Value = length)` to identify where the duplicates arise
* Use `values_fn = list(Value = summary_fun)` to summarise duplicates 

Ideal output would be something like this:

    structure(list(DataYear = c(2018L, 2017L, 2016L, 2018L, 2017L, 
2016L), Company = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "One", class = "factor"), 
    Market = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Total", class = "factor"), 
    Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"), 
    ISO = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"), 
    Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Insurance", class = "factor"), 
    Division = c(NA, NA, NA, NA, NA, NA), Furtherdetails1 = c(NA, 
    NA, NA, NA, NA, NA), Furtherdetails2 = c(NA, NA, NA, NA, 
    NA, NA), Indicator = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Tax Avoidance", 
    "Turnover"), class = "factor"), IndicatorID = c(20L, 20L, 
    20L, 26L, 26L, 26L), Value_InputA = c(2129000000L, 2003000000L, 
    1193000000L, NA, NA, NA), InputName_InputA = structure(c(2L, 
    2L, 2L, 1L, 1L, 1L), .Label = c("", "Profit before tax (Attributable to shareholder profit)"
    ), class = "factor"), UnitRequired_InputA = structure(c(2L, 
    2L, 2L, 1L, 1L, 1L), .Label = c("", "GBP"), class = "factor"), 
    Value_InputB = c(4.47e+08, 6.2e+08, 6.47e+08, 37628, 42431, 
    39833.44), InputName_InputB = structure(c(2L, 2L, 2L, 1L, 
    1L, 1L), .Label = c("Number of employees", "Tax Paid"), class = "factor"), 
    UnitRequired_InputB = structure(c(2L, 2L, 2L, 1L, 1L, 1L), .Label = c("#", 
    "GBP"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

Any help would be greatly appreciated!

Thanks

Upvotes: 1

Views: 3842

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76402

Using @Ronak Shah's suggestion in his comment to create a row column, the following seems to do it. I have added a second grouping column, Indicator.

library(tidyverse)

LongTest %>%
  group_by(InputCode, Indicator) %>% 
  mutate(row = row_number()) %>%
  pivot_wider(id_cols = c(row, Indicator),
              names_from = InputCode, 
              values_from = c(Value, UnitRequired, InputName)) %>%
  select(-row)

Upvotes: 3

Related Questions