Reputation: 557
I am having issue with rearranging some data.
The original data is:
structure(list(id = 1:3, artery.1 = structure(c(1L, 1L, 2L), .Label = c("a",
"b"), class = "factor"), artery.2 = structure(c(1L, NA, 2L), .Label = c("b",
"c"), class = "factor"), artery.3 = structure(c(1L, NA, 2L), .Label = c("c",
"d"), class = "factor"), artery.4 = structure(c(NA, NA, 1L), .Label = "e", class = "factor"), artery.5 = structure(c(NA, NA, 1L), .Label = "f", class = "factor"),
diameter.1 = c(3L, 2L, 1L), diameter.2 = c(2L, NA, 2L), diameter.3 = c(3L,
NA, 3L), diameter.4 = c(NA, NA, 4L), diameter.5 = c(NA, NA,
5L)), .Names = c("id", "artery.1", "artery.2", "artery.3",
"artery.4", "artery.5", "diameter.1", "diameter.2", "diameter.3",
"diameter.4", "diameter.5"), class = "data.frame", row.names = c(NA,
-3L))
# id artery.1 artery.2 artery.3 artery.4 artery.5 diameter.1 diameter.2 diameter.3 diameter.4 diameter.5
# 1 1 a b c <NA> <NA> 3 2 3 NA NA
# 2 2 a <NA> <NA> <NA> <NA> 2 NA NA NA NA
# 3 3 b c d e f 1 2 3 4 5
I would like to get to this:
structure(list(id = 1:3, a = c(3L, 2L, NA), b = c(2L, NA, 1L),
c = c(3L, NA, 2L), d = c(NA, NA, 3L), e = c(NA, NA, 4L),
f = c(NA, NA, 5L)), .Names = c("id", "a", "b", "c", "d",
"e", "f"), class = "data.frame", row.names = c(NA, -3L))
# id a b c d e f
# 1 1 3 2 3 NA NA NA
# 2 2 2 NA NA NA NA NA
# 3 3 NA 1 2 3 4 5
Basically, a
to f
represents arteries and the numerical values represent the corresponding diameter. Each row represents a patient.
Is there a neat way to sort this dataframe out?
Upvotes: 1
Views: 91
Reputation: 11957
Modern tidyr makes the solution even more succinct via the pivot_
functions:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-id, names_pattern = '(artery|diameter)\\.(\\d+)', names_to = c('.value', NA)) %>%
filter(!is.na(artery)) %>%
pivot_wider(names_from = artery, values_from = diameter)
id a b c d e f
<int> <int> <int> <int> <int> <int> <int>
1 1 3 2 3 NA NA NA
2 2 2 NA NA NA NA NA
3 3 NA 1 2 3 4 5
Here is the older solution, which uses the deprecated gather
and spread
functions:
library(dplyr)
library(tidyr)
new.df <- gather(df, variable, value, artery.1:diameter.5) %>%
separate(variable, c('variable', 'num')) %>%
spread(variable, value) %>%
subset(!is.na(artery)) %>%
mutate(diameter = as.numeric(diameter)) %>%
select(-num) %>%
spread(artery, diameter)
Output:
id a b c d e f
1 1 3 2 3 NA NA NA
2 2 2 NA NA NA NA NA
3 3 NA 1 2 3 4 5
Upvotes: 3
Reputation: 92300
Or using melt
/dcast
combination with data.table
while selecting variables using regex in the patterns
function
library(data.table) #v>=1.9.6
dcast(melt(setDT(df),
id = "id",
measure = patterns("artery", "diameter")),
id ~ value1,
sum,
value.var = "value2",
subset = .(!is.na(value2)),
fill = NA)
# id a b c d e f
# 1: 1 3 2 3 NA NA NA
# 2: 2 2 NA NA NA NA NA
# 3: 3 NA 1 2 3 4 5
As you can see, both melt
and dcast
are very flexible and you can use regex, specify a subset, pass multiple functions and specify how you want to fill missing values.
Upvotes: 2
Reputation: 35324
This can be done with two reshape()
calls. First, we can longify both artery
and diameter
on id
, then widen with artery
as the time variable. To prevent a column of NAs, we also must subset out rows with NA values for artery
in the intermediate frame.
reshape(subset(reshape(df,dir='l',varying=setdiff(names(df),'id'),timevar=NULL),!is.na(artery)),dir='w',timevar='artery');
## id diameter.a diameter.b diameter.c diameter.d diameter.e diameter.f
## 1.1 1 3 2 3 NA NA NA
## 2.1 2 2 NA NA NA NA NA
## 3.1 3 NA 1 2 3 4 5
The diameter.
prefixes can be removed afterward, if desired. However, an advantage of this solution is that it would be capable of preserving multiple column sets, whereas the xtabs()
solution cannot. The prefixes would be essential to distinguish the column sets in that case.
Upvotes: 1
Reputation: 215117
You can use xtabs
with reshape
from base R. Use the latter to transform data to long format and use the former to get the count table:
xtabs(diameter ~ id + artery, reshape(df, varying = 2:11, sep = '.', dir = "long"))
# artery
#id a b c d e f
# 1 3 2 3 0 0 0
# 2 2 0 0 0 0 0
# 3 0 1 2 3 4 5
Upvotes: 1