Reputation: 1000
I have a dataframe that looks like the following. There are several variables (like "c" and "z") with measurements for health, animals, enviro, and money. In the actual dataframe, there are many other columns that do not follow this pattern and are interspersed throughout.
id c_health c_animals c_enviro c_money z_health z_animals z_enviro z_money
1 3 2 4 5 7 9 6 8
2 2 3 5 4 8 7 6 9
3 4 1 2 3 9 6 8 7
I am trying to rearrange the data to make it "tidy". I am not sure what to do when there are several variables like in my current dataset. This is the kind of result I would eventually like to end up with:
id c z message
1 3 7 health
1 2 9 animals
1 4 6 enviro
1 5 8 money
2 2 8 health
2 3 7 animals
2 5 6 enviro
2 4 9 money
3 4 9 health
3 1 6 animals
3 2 8 enviro
3 3 7 money
If the dataframe just included the following columns, I could make it tidy in the following way:
id c_health c_animals c_enviro c_money
1 3 2 4 5
2 2 3 5 4
3 4 1 2 3
df <- df %>%
gather(., key = "question", value = "response", 2:5)
Upvotes: 0
Views: 208
Reputation: 6931
You are on the right track with using gather
, but need some additional steps to split the prefix off the column names. Try the following:
library(dplyr)
library(tidyr)
df = data.frame(
id = c(1,2,3),
c_health = c(3,2,4),
c_animals = c(2,3,1),
z_health = c(7,8,9),
z_animals = c(9,7,6),
stringsAsFactors = FALSE
)
output = df %>%
# gather on all columns other than id
gather(key = "question", value = "response", -all_of("id")) %>%
# split off prefix and rest of column name
mutate(prefix = substr(question,1,1),
desc = substr(question,3,nchar(question))) %>%
# keep just the columns of interest
select(id, prefix, desc, response) %>%
# reshape wider
spread(prefix, response)
Update - my comment on differing prefix lengths does not return the correct answer. Because []
indexing does not work that way inside mutate. Same idea but correct syntax as follows:
output = df %>%
# gather on all columns other than id
gather(key = "question", value = "response", -all_of("id")) %>%
# split off prefix and rest of column name
mutate(split = strsplit(question, "_")) %>%
mutate(prefix = sapply(split, function(x){x[1]}),
desc = sapply(split, function(x){x[2]})) %>%
# keep just the columns of interest
select(id, prefix, desc, response) %>%
# reshape wider
spread(prefix, response)
Upvotes: 1
Reputation: 81
You can do this with the tidyr package and pivot_longer:
library(tidyr)
library(dplyr)
df %>%
pivot_longer(cols = 2:ncol(df),
names_to = c(".value", "message"),
names_sep = "_")
Upvotes: 2