rg4s
rg4s

Reputation: 897

How can I unnest a vector from a dataframe?

I have a dataframe in a long format with companies and their estimates by regions. I want to build a wide table where I could see how many positive and negative estimates were given to a company by region. When I try pivot_wider I recieve a dataframe with vectors in its cells. That is okay, however I am unable to count the number of positive and negative feedbacks by regions. Also tried using unnest and unnest_longer functions. The latter though seems to resolve my problem, but it takes only one argument for column to unnest.

How can I, may be, modify my pivot_wider to get the desirable result?

My dataframe:

set.seed(1407)
test_df <- data.frame(code = rep(c("positive", "negative"), 9),
                      company = c("Google", "Amazon", "SpaceX", "BlueOrigin",
                                  "Google", "Western Digital", "Aliexpress",
                                  "Tencent", "Aliexpress"),
                      n = rbinom(18, size = 9, prob = 0.5),
                      region = c("Asia", "Europe", "Middle East")) 

What I get using a function to wide the table:

test_df %>% 
  pivot_wider(id_cols = region,
              names_from = code,
              values_from = n)

# A tibble: 3 x 3
  region      positive  negative 
  <chr>       <list>    <list>   
1 Asia        <int [3]> <int [3]>
2 Europe      <int [3]> <int [3]>
3 Middle East <int [3]> <int [3]>

My desirable output:

region      positive  negative

Asia            4        2
Asia            3        5
Asia            5        2
Europe          3        5
Europe          6        4
Europe          5        1
Middle East     8        5
Middle East     6        5
Middle East     6        2

Upvotes: 3

Views: 182

Answers (2)

Anoushiravan R
Anoushiravan R

Reputation: 21938

You can modify your solution in the followin way. When the combination of id_cols and value does not uniquely identify an observation the result will be a named list. You may notice that you have more than one combination of Asia and n == 5 for example so I decided to use all remaining columns instead of those specified in names_from and values_from in id_cols as is the default choice than just region.

library(tidyr)

test_df %>% 
  pivot_wider(names_from = code,
              values_from = n) %>%
  arrange(region)

# A tibble: 9 x 4
  company         region      positive negative
  <chr>           <chr>          <int>    <int>
1 Google          Asia               4        5
2 BlueOrigin      Asia               5        2
3 Aliexpress      Asia               3        2
4 Amazon          Europe             6        5
5 Google          Europe             3        1
6 Tencent         Europe             5        4
7 SpaceX          Middle East        8        5
8 Western Digital Middle East        6        5
9 Aliexpress      Middle East        6        2

Upvotes: 3

ThomasIsCoding
ThomasIsCoding

Reputation: 102241

A base R option using reshape

reshape(
    test_df,
    direction = "wide",
    idvar = c("company", "region"),
    timevar = "code"
)

gives

          company      region n.positive n.negative
1          Google        Asia          5          6
2          Amazon      Europe          4          6
3          SpaceX Middle East          2          2
4      BlueOrigin        Asia          5          6
5          Google      Europe          5          3
6 Western Digital Middle East          2          4
7      Aliexpress        Asia          6          3
8         Tencent      Europe          5          3
9      Aliexpress Middle East          4          4

Upvotes: 2

Related Questions