Reputation: 163
I have a wide data set with the following variable names (a simplified example see below).
df <- structure(list(id = structure(1:4, .Label = c("00000001", "00000002",
"00000003", "00000004"), class = "factor"),
r1weight = c(56, 76, 87, 64),
r2weight = c(57, 75, 88, 66),
r3weight = c(56, 76, 87, 65),
r4weight = c(56L, 73L, 85L, 63L),
r5weight = c(55L, 77L, 84L, 65L),
r1height = c(151L, 163L, 173L, 153L),
r2height = c(154L, 164L, NA, 154L),
r3height = c(NA, 165L, NA, 152L),
r4height = c(153L, 162L, 172L, 154L),
r5height = c(152,161,171,154),
bmi2002 = c(27.0, 23.9,20.1,23.9),
bmi2004 = c(20.0, 29.9,21.1,27.3),
bmi2006 = c(21.0, 21.9,23.1,24.3),
bmi2008 = c(21.2, 21.2,23.4,24.5),
bmi2010 = c(24.1, 25.9, 26.2,28.3)),
class ="data.frame", row.names = c(NA, -4L))
> df
id r1weight r2weight r3weight r4weight r5weight r1height r2height r3height r4height r5height bmi2002 bmi2004 bmi2006 bmi2008 bmi2010
1 00000001 56 57 56 56 55 151 154 NA 153 152 27.0 20.0 21.0 21.2 24.1
2 00000002 76 75 76 73 77 163 164 165 162 161 23.9 29.9 21.9 21.2 25.9
3 00000003 87 88 87 85 84 173 NA NA 172 171 20.1 21.1 23.1 23.4 26.2
4 00000004 64 66 65 63 65 153 154 152 154 154 23.9 27.3 24.3 24.5 28.3
I want to convert the wide format data set to the long format data set. I have tried the reshape function, but I don't know how to use reshape function when the naming pattern of longitudinal variables is different. Also, reshape function requires me to specify each variable set, and it's a lot of work in my real analysis.
Is there any easier way for me to handle reshaping data from wide to long in my case?
For your reference, I have attached the name list of the real data that I need to handle:
[1] "hhid" "rahhidpn.x" "pn" "reshhidpn.x" "r1agey_e" "r2agey_e"
[7] "r3agey_e" "r4agey_e" "r5agey_e" "r6agey_e" "r7agey_e" "r8agey_e"
[13] "r9agey_e" "r10agey_e" "r11agey_e" "r12agey_e" "r13agey_e" "r14agey_e"
[19] "respagey_e" "r1mstat" "r2mstat" "r3mstat" "r4mstat" "r5mstat"
[25] "r6mstat" "r7mstat" "r8mstat" "r9mstat" "r10mstat" "r11mstat"
[31] "r12mstat" "r13mstat" "r14mstat" "remstat" "rabyear" "ragender"
[37] "rabplace" "raedyrs" "raedegrm" "r1jcocc" "r2jcocc" "r3jcocc"
[43] "r4jcocc" "r5jcocc" "r6jcocc" "r7jcocc" "r8jcocc" "r9jcocc"
[49] "r10jcocc" "r11jcocc" "r12jcocc" "r13jcocc" "r14jcocc" "r7mdactx"
[55] "r8mdactx" "r9mdactx" "r10mdactx" "r11mdactx" "r12mdactx" "r13mdactx"
[61] "r14mdactx" "r7ltactx" "r8ltactx" "r9ltactx" "r10ltactx" "r11ltactx"
[67] "r12ltactx" "r13ltactx" "r14ltactx" "r3drinkn" "r4drinkn" "r5drinkn"
[73] "r6drinkn" "r7drinkn" "r8drinkn" "r9drinkn" "r10drinkn" "r11drinkn"
[79] "r12drinkn" "r13drinkn" "r14drinkn" "r1smoken" "r2smoken" "r3smoken"
[85] "r4smoken" "r5smoken" "r6smoken" "r7smoken" "r8smoken" "r9smoken"
[91] "r10smoken" "r11smoken" "r12smoken" "r13smoken" "r14smoken" "r1bmi"
[97] "r2bmi" "r3bmi" "r4bmi" "r5bmi" "r6bmi" "r7bmi"
[103] "r8bmi" "r9bmi" "r10bmi" "r11bmi" "r12bmi" "r13bmi"
[109] "r14bmi" "r8pmbmi" "r9pmbmi" "r10pmbmi" "r11pmbmi" "r12pmbmi"
[115] "r13pmbmi" "r14pmbmi" "r2cesd" "r3cesd" "r4cesd" "r5cesd"
[121] "r6cesd" "r7cesd" "r8cesd" "r9cesd" "r10cesd" "r11cesd"
[127] "r12cesd" "r13cesd" "r14cesd" "r1strok" "r2strok" "r3strok"
[133] "r4strok" "r5strok" "r6strok" "r7strok" "r8strok" "r9strok"
[139] "r10strok" "r11strok" "r12strok" "r13strok" "r14strok" "r1diab"
[145] "r2diab" "r3diab" "r4diab" "r5diab" "r6diab" "r7diab"
[151] "r8diab" "r9diab" "r10diab" "r11diab" "r12diab" "r13diab"
[157] "r14diab" "r1heart" "r2heart" "r3heart" "r4heart" "r5heart"
[163] "r6heart" "r7heart" "r8heart" "r9heart" "r10heart" "r11heart"
[169] "r12heart" "r13heart" "r14heart" "r1hibp" "r2hibp" "r3hibp"
[175] "r4hibp" "r5hibp" "r6hibp" "r7hibp" "r8hibp" "r9hibp"
[181] "r10hibp" "r11hibp" "r12hibp" "r13hibp" "r14hibp" "rahhidpn.y"
[187] "reshhidpn.y" "h1atotb" "h2atotb" "h3atotb" "h4atotb" "h5atotb"
[193] "h6atotb" "h7atotb" "h8atotb" "h9atotb" "h10atotb" "h11atotb"
[199] "h12atotb" "h13atotb" "h14atotb" "imrc_imp2018" "imrc_imp2016" "imrc_imp1995"
[205] "imrc_imp1996" "imrc_imp1998" "imrc_imp2000" "imrc_imp2002" "imrc_imp2004" "imrc_imp2006"
[211] "imrc_imp2008" "imrc_imp2010" "imrc_imp2012" "imrc_imp2014" "dlrc_imp2018" "dlrc_imp2016"
[217] "dlrc_imp1995" "dlrc_imp1996" "dlrc_imp1998" "dlrc_imp2000" "dlrc_imp2002" "dlrc_imp2004"
[223] "dlrc_imp2006" "dlrc_imp2008" "dlrc_imp2010" "dlrc_imp2012" "dlrc_imp2014" "ser7_imp2018"
[229] "ser7_imp2016" "ser7_imp1995" "ser7_imp1996" "ser7_imp1998" "ser7_imp2000" "ser7_imp2002"
[235] "ser7_imp2004" "ser7_imp2006" "ser7_imp2008" "ser7_imp2010" "ser7_imp2012" "ser7_imp2014"
[241] "bwc20_imp2018" "bwc20_imp2016" "bwc20_imp1995" "bwc20_imp1996" "bwc20_imp1998" "bwc20_imp2000"
[247] "bwc20_imp2002" "bwc20_imp2004" "bwc20_imp2006" "bwc20_imp2008" "bwc20_imp2010" "bwc20_imp2012"
[253] "bwc20_imp2014" "cogtot27_imp2018" "cogtot27_imp2016" "cogtot27_imp1995" "cogtot27_imp1996" "cogtot27_imp1998"
[259] "cogtot27_imp2000" "cogtot27_imp2002" "cogtot27_imp2004" "cogtot27_imp2006" "cogtot27_imp2008" "cogtot27_imp2010"
[265] "cogtot27_imp2012" "cogtot27_imp2014" "cogfunction2018" "cogfunction2016" "cogfunction1995" "cogfunction1996"
[271] "cogfunction1998" "cogfunction2000" "cogfunction2002" "cogfunction2004" "cogfunction2006" "cogfunction2008"
[277] "cogfunction2010" "cogfunction2012" "cogfunction2014" "proxy2018" "proxy2016" "proxy1995"
[283] "proxy1996" "proxy1998" "proxy2000" "proxy2002" "proxy2004" "proxy2006"
[289] "proxy2008" "proxy2010" "proxy2012" "proxy2014" "PC1_5A" "PC1_5B"
[295] "PC1_5C" "PC1_5D" "PC1_5E" "PC6_10A" "PC6_10B" "PC6_10C"
[301] "PC6_10D" "PC6_10E" "E4_AD_IGAP13" "E4_AD2_IGAP13" "E4_GWAD2NA_IGAP19" "E4_GWAD2WA_IGAP19"
[307] "E4_01AD2NA_IGAP19" "E4_01AD2WA_IGAP19" "myrs" "fyrs" "humcap" "famstr"
[313] "maternal" "soccap" "ave_fin_re" "fincap" "cses_index" "fin_inst"
[319] "hhidpn"
Upvotes: 1
Views: 85
Reputation: 18714
reshape
, pivot_longer
, and pivot_wider
are variations of the same idea. For any of them, you need a column that's unique for the data. So let's say you wanted to change all of the data into a longer format, then add a column with the row numbers and made that your independent column. In pivots in this answer, I just left the first column as the static field and pivoted everything else.
For the data you provided:
library(tidyverse)
df1 <- pivot_longer(df, cols = r1weight:bmi2010, # inclusively all columns between
names_to = "fields", values_to = "values")
head(df1)
# # A tibble: 6 × 3
# id fields values
# <fct> <chr> <dbl>
# 1 00000001 r1weight 56
# 2 00000001 r2weight 57
# 3 00000001 r3weight 56
# 4 00000001 r4weight 56
# 5 00000001 r5weight 55
# 6 00000001 r1height 151
# frame is now 60 observations with three columns
I also created a data structure with all of the column names you provided. (df4
is a vector of the column names you provided in your question.)
df5 <- matrix(ncol = length(df4), nrow = 100, dimnames = list(1:100, df4))
colnames(df5)[c(1, 2, 319)]
# [1] "hhid" "rahhidpn.x" "hhidpn"
df5 <- as.data.frame(df5)
df6 <- pivot_longer(df5, cols = rahhidpn.x:hhidpn, # inclusively all columns between
names_to = "fields", values_to = "values")
nrow(df6)
# [1] 31800
ncol(df6)
# [1] 3
Upvotes: 1