Reputation: 770
I have data as below:
IDnum zipcode City County State
10011 36006 Billingsley Autauga AL
10011 36022 Deatsville Autauga AL
10011 36051 Marbury Autauga AL
10011 36051 Prattville Autauga AL
10011 36066 Prattville Autauga AL
10011 36067 Verbena Autauga AL
10011 36091 Selma Autauga AL
10011 36703 Jones Autauga AL
10011 36749 Plantersville Autauga AL
10011 36758 Uriah Autauga AL
10011 36480 Atmore Autauga AL
10011 36502 Bon Secour Autauga AL
I have a list of zipcodes, the cities they encompass, and counties/states they are located in. IDnum = numeric value for county and state, combined. List is in format you see now, I need to reshape it from long to wide / vertical to horizontal, where the IDnum variable becomes the unique identifier, and all other possible value combinations become wide variables.
IDnum zip1 city1 county1 state1 zip2 city2 county2
10011 36006 Billingsley Autauga AL 36022 Deatsville Autauga
This is just sample of the dataset, it encompasses every zip in the USA and includes more variables. I have seen other questions and answers similar to this one, but not where there are multiple values in almost every column.
There are commands in SPSS and STATA that will reshape data this way, in SPSS I can run a Restructure/Cases to Vars command that turns 11 variables in my initial dataset into about 1750, b/c one county has over 290 zips and it replicates most of the other variables 290+ times. This will create many blanks, but I need it to be reshaped into one very long horizontal file.
I have looked at reshape and reshape2, and am hung up on the 'default to length' error message. I did get melt/dcast to sorta work, but this creates one variable that is a list of all values, rather than creating variables for each value.
melted_dupes <- melt(zip_code_list_dupes, id.vars= c("IDnum"))
HRZ_dupes <- dcast(melted_dupes, IDnum ~ variable, fun.aggregate = list)
I have tried tidyr and dplyr but got lost in syntax. Am a little surprised there isn't a command the data similar to built in commands in other packages, making me assume there is, and I just haven't figured it out.
Any help is appreciated.
Upvotes: 0
Views: 1218
Reputation: 16121
There might be a more efficient way, but try the following. I used my own (example) dataset, very similar to yours. Run the process step by step to see how it works, as you'll have to modify some things in the code.
library(dplyr)
library(tidyr)
# get example data
dt = data.frame(id = c(1,1,1,2,2),
zipcode = c(4,5,6,7,8),
city = c("A","B","C","A","C"),
county = c("A","B","C","A","C"),
state = c("A","B","C","A","C"))
dt
# id zipcode city county state
# 1 1 4 A A A
# 2 1 5 B B B
# 3 1 6 C C C
# 4 2 7 A A A
# 5 2 8 C C C
# get maximum number of rows for a single id
# this will help you get the wide format
max_num_rows = max((dt %>% count(id))$n)
# get names of columns to reshape
col_names = names(dt)[-1]
dt %>%
group_by(id) %>%
mutate(nrow = paste0("row",row_number())) %>%
unite_("V",col_names) %>%
spread(nrow, V) %>%
unite("z",matches("row")) %>%
separate(z, paste0(col_names, sort(rep(1:max_num_rows, ncol(dt)-1))), convert=T) %>%
ungroup()
# # A tibble: 2 × 13
# id zipcode1 city1 county1 state1 zipcode2 city2 county2 state2 zipcode3 city3 county3 state3
# * <dbl> <int> <chr> <chr> <chr> <int> <chr> <chr> <chr> <int> <chr> <chr> <chr>
# 1 1 4 A A A 5 B B B 6 C C C
# 2 2 7 A A A 8 C C C NA <NA> <NA> <NA>
Upvotes: 0
Reputation: 26436
You can do this with the base function reshape
after adding in a consecutive count by IDnum
. Assuming your data is stored in a data.frame
named df
:
df2 <- within(df, count <- ave(rep(1,nrow(df)),df$IDnum,FUN=cumsum))
Provides a new column of the consecutive count named "time". And now we can reshape
to wide format
reshape(df2,direction="wide",idvar="IDnum",timevar="count")
IDnum zipcode.1 City.1 County.1 State.1 zipcode.2 City.2 County.2 State.2 zipcode.3 City.3 County.3 State.3 zipcode.4 City.4 County.4 State.4 1 10011 36006 Billingsley Autauga AL 36022 Deatsville Autauga AL 36051 Marbury Autauga AL 36051 Prattville Autauga AL
(output truncated, goes all the way to zipcode.12, etc.)
Upvotes: 2