Reputation: 11
I have a .dbf file I exported from ArcGIS 10.1 and I need to reorganize the it. An example of the data is:
V1 V2
40.000000000000000 41.000000000000000
40.000000000000000 42.000000000000000
41.000000000000000 40.000000000000000
41.000000000000000 42.000000000000000
41.000000000000000 43.000000000000000
42.000000000000000 40.000000000000000
42.000000000000000 41.000000000000000
42.000000000000000 43.000000000000000
43.000000000000000 41.000000000000000
43.000000000000000 42.000000000000000
I need the data in a format where there is only one row for each unique value in the first column, with all of the corresponding values from the second column now appearing in that row, for example:
V1 V2 V3 V4
40.000000000000000 41.000000000000000 42.000000000000000
41.000000000000000 40.000000000000000 42.000000000000000 43.000000000000000
42.000000000000000 40.000000000000000 41.000000000000000 43.000000000000000
43.000000000000000 41.000000000000000 42.000000000000000
If anyone can help me with this problem I would appreciate it. Thanks!
Upvotes: 1
Views: 1108
Reputation: 886998
You could also do this in dplyr
library(dplyr)
library(tidyr)
dat%>%
group_by(X1) %>%
mutate(Time=seq_along(X1))
%>%spread(Time,X2)
#Source: local data frame [4 x 4]
#X1 1 2 3
#1 40 41 42 NA
#2 41 40 42 43
#3 42 40 41 43
#4 43 41 42 NA
Upvotes: 2
Reputation: 193517
This is essentially a reshape
problem, but you don't have a "time" variable.
You can easily create a "time" variable as follows:
dat$time <- with(dat, ave(X1, X1, FUN = seq_along))
From there, use reshape
from base R...
reshape(dat, direction = "wide", idvar="X1", timevar="time")
# X1 X2.1 X2.2 X2.3
# 1 40 41 42 NA
# 3 41 40 42 43
# 6 42 40 41 43
# 9 43 41 42 NA
... or dcast
from "reshape2"...
library(reshape2)
dcast(dat, X1 ~ time, value.var="X2")
# X1 1 2 3
# 1 40 41 42 NA
# 2 41 40 42 43
# 3 42 40 41 43
# 4 43 41 42 NA
Upvotes: 1
Reputation: 44309
You can split up your data frame using the split
function on the first column and the use lapply
to extract your vectors:
dat = data.frame(X1=c(40, 40, 41, 41, 41, 42, 42, 42, 43, 43),
X2=c(41, 42, 40, 42, 43, 40, 41, 43, 41, 42))
res <- lapply(split(dat, dat[,1]), function(d) c(d[1,1], sort(unique(d[,2]))))
res
# $`40`
# [1] 40 41 42
#
# $`41`
# [1] 41 40 42 43
#
# $`42`
# [1] 42 40 41 43
#
# $`43`
# [1] 43 41 42
Most would probably prefer to keep the data in this format, but you can also combine the list into a matrix, right-padding the vectors with NA
values:
max.len <- max(unlist(lapply(res, length)))
do.call(rbind, lapply(res, function(x) { length(x) <- max.len ; x }))
# [,1] [,2] [,3] [,4]
# 40 40 41 42 NA
# 41 41 40 42 43
# 42 42 40 41 43
# 43 43 41 42 NA
Upvotes: 2