Reputation: 103
I am trying to use pivot wider to reduce the number of rows in my data and add new columns. However, the number of columns increases, but the number of rows remains the same. Ideally, each 'Indicator' should be one observation, where the DataYear, Company, Market, Country etc columns are the same. I think the issue may be due to duplicate observations, but dont understand how the IndicatorID column doesnt fix this issue?
A sample of my data:
LongTest <- structure(list(DataYear = c(2018L, 2017L, 2016L, 2018L, 2017L,
2016L, 2018L, 2017L, 2016L), Company = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = "One", class = "factor"), Market = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Total", class = "factor"),
Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"),
ISO = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"),
Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Insurance", class = "factor"),
Division = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), Furtherdetails1 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), Furtherdetails2 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), Indicator = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Tax Avoidance",
"Turnover"), class = "factor"), IndicatorID = c(20L, 20L,
20L, 20L, 20L, 20L, 26L, 26L, 26L), InputName = structure(c(3L,
3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Number of employees",
"Profit before tax (Attributable to shareholder profit)",
"Tax Paid"), class = "factor"), InputCode = structure(c(2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("InputA", "InputB"
), class = "factor"), UnitRequired = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("#", "GBP"), class = "factor"),
Value = c(4.47e+08, 6.2e+08, 6.47e+08, 2.129e+09, 2.003e+09,
1.193e+09, 37628, 42431, 39833.44), UniqueID = 1:9), class = "data.frame", row.names = c(NA,
-9L))
And the code I am currently using:
outTest <- pivot_wider(LongTest, names_from = InputCode, values_from = c(Value, UnitRequired, InputName))
When I use my full dataframe I get this error message:
Warning messages:
1: Values in `InputName` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(InputName = list)` to suppress this warning.
* Use `values_fn = list(InputName = length)` to identify where the duplicates arise
* Use `values_fn = list(InputName = summary_fun)` to summarise duplicates
2: Values in `UnitRequired` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(UnitRequired = list)` to suppress this warning.
* Use `values_fn = list(UnitRequired = length)` to identify where the duplicates arise
* Use `values_fn = list(UnitRequired = summary_fun)` to summarise duplicates
3: Values in `Value` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(Value = list)` to suppress this warning.
* Use `values_fn = list(Value = length)` to identify where the duplicates arise
* Use `values_fn = list(Value = summary_fun)` to summarise duplicates
Ideal output would be something like this:
structure(list(DataYear = c(2018L, 2017L, 2016L, 2018L, 2017L,
2016L), Company = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "One", class = "factor"),
Market = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Total", class = "factor"),
Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"),
ISO = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"),
Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Insurance", class = "factor"),
Division = c(NA, NA, NA, NA, NA, NA), Furtherdetails1 = c(NA,
NA, NA, NA, NA, NA), Furtherdetails2 = c(NA, NA, NA, NA,
NA, NA), Indicator = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Tax Avoidance",
"Turnover"), class = "factor"), IndicatorID = c(20L, 20L,
20L, 26L, 26L, 26L), Value_InputA = c(2129000000L, 2003000000L,
1193000000L, NA, NA, NA), InputName_InputA = structure(c(2L,
2L, 2L, 1L, 1L, 1L), .Label = c("", "Profit before tax (Attributable to shareholder profit)"
), class = "factor"), UnitRequired_InputA = structure(c(2L,
2L, 2L, 1L, 1L, 1L), .Label = c("", "GBP"), class = "factor"),
Value_InputB = c(4.47e+08, 6.2e+08, 6.47e+08, 37628, 42431,
39833.44), InputName_InputB = structure(c(2L, 2L, 2L, 1L,
1L, 1L), .Label = c("Number of employees", "Tax Paid"), class = "factor"),
UnitRequired_InputB = structure(c(2L, 2L, 2L, 1L, 1L, 1L), .Label = c("#",
"GBP"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
Any help would be greatly appreciated!
Thanks
Upvotes: 1
Views: 3842
Reputation: 76402
Using @Ronak Shah's suggestion in his comment to create a row
column, the following seems to do it. I have added a second grouping column, Indicator
.
library(tidyverse)
LongTest %>%
group_by(InputCode, Indicator) %>%
mutate(row = row_number()) %>%
pivot_wider(id_cols = c(row, Indicator),
names_from = InputCode,
values_from = c(Value, UnitRequired, InputName)) %>%
select(-row)
Upvotes: 3