Nneka
Nneka

Reputation: 1860

from long data table to wide data table

_aI am trying to transform the dataset. What i want to achieve is that if Intensitat = 1 - R creates a new column - Intensity 1a - and writes the value of damage_a. If Intensitat = 2 - R creates another new column - Intensity 2a - and writes the values of damage_a. Finally if Intensitat = 3 then R does new column - Intensity 3a - and writes the values of damage_a. (it always writes the value of damage_x corresponding to the same row as the intensity indicator).

Then I will run the same process: create 3 columns from an indicator variables - each column for an indicator value and in each column use the values of damage_b.

Lastly the same again but use the values of damage_c.

type <- sample(seq(from = 1, to = 5, by = 1), size = 50, replace = TRUE)
type

intensity <- sample(seq(from = 1, to = 3, by = 1), size = 50, replace = TRUE)
intensity

damage_a <- sample(seq(from = 1, to = 50, by = 1), size = 50, replace = TRUE)
damage_b <- sample(seq(from = 1, to = 200, by = 1), size = 50, replace = TRUE)
damage_c <- sample(seq(from = 1, to = 500, by = 1), size = 50, replace = TRUE)

dat <- data.frame(type, intensity, damage_a, damage_b, damage_c)

# want to  evaluate if the difference in mean  between different damages 
  dat1 <- subset(dat, select = c(type, intensity, damage_a))
  dat2 <- subset(dat, select = c(type, intensity, damage_b))
  dat3 <- subset(dat, select = c(type, intensity, damage_c))

It might seem like an very easy task for you, but I have been stuck on this for two days. I have tried applyin tidyr function, ifelse comands, etc but never obtained what i needed. Any comments/ideas are very appriciated. Thank you.

EDIT:

I managed to change the format if the table with this little handy function.

df.out <- unstack(dat, damage_a ~ intensity)

The problem now is that i get lists that differ in lenght. How can i combine these lists into ona data frame?

Upvotes: 1

Views: 61

Answers (2)

akrun
akrun

Reputation: 887531

We can use dcast from data.table which can take multiple value.var columns. As there are multiple observations per group, we can use the rowid function to create a sequence variable by 'type' and 'intensity' in the formula. This will result in a new column 'type_1', but it can be assigned to NULL later

library(data.table)
dcast(setDT(dat), type + rowid(type, intensity) ~ paste0("intensity_", 
  intensity), value.var = c('damage_a', 'damage_b', 'damage_c'))[, type_1 := NULL][]

Upvotes: 1

mkearney
mkearney

Reputation: 1345

Check out the gather and spread functions in tidyr, e.g.

dat <- tidyr::spread(dat, damage, value, -intensity, -type)   
tidyr::spread(dat, damage, value)

Upvotes: 0

Related Questions