Reputation: 367
I am trying to spread a data frame, but I am not quite familiar with spread()
and gather()
.
Below is a sample of my data. It has 9 rows all with the same Application.Number
. I would like to end up with one row per Application.Number-Decicion
combination. The remaining variables date_generated
date_decided
time_to_decision
and text
have to be repeated for each Application.Number-Decicion
combination or the last one should be taken. The data is already sorted by Application.Number
and date_generated
.
structure(list(Application.Number = c(80749L, 80749L, 80749L,
80749L, 80749L, 80749L, 80749L, 80749L, 80749L), Decision = c("Invalid",
"Invalid", "Invalid", "Invalid", "Invalid", "Invalid", "Approved",
"Approved", "Approved"), date_generated = structure(c(1521810060,
1521810060, 1523523840, 1523536500, 1524036720, 1524136380, 1524137460,
1524137460, 1524137460), class = c("POSIXct", "POSIXt"), tzone = ""),
date_decided = structure(c(1522155960, 1522155660, 1523534400,
1523600520, 1524127140, 1524136740, 1524211800, 1524211740,
1524211200), class = c("POSIXct", "POSIXt"), tzone = ""),
time_to_decision = c(4.00347222222222, 4, 0.122222222222222,
0.740972222222222, 1.04652777777778, 0.00416666666666667,
0.860416666666667, 0.859722222222222, 0.853472222222222),
text = c("rIUQRmOkyZ", "ZxdYUr16NR", "8IIipoleOV", "nLuIgToxcT",
"xYFksrws87", "N2oECMtgQo", "RKcrBcBFI2", "jaH438byVt", "80ggA2hZr7"
)), row.names = 15880:15888, class = "data.frame")
EDIT: Decided that the output should be just one row and all rows should pivot around Application.Number
.
I ended up making a separate data frame with the duplicates and joining it back to the unique rows.
There must be a better way to do it.
Upvotes: 0
Views: 74
Reputation: 3235
Well, you could do it, but I agree with user 42-, it will lead to problems because of data format later:
> gather(x, "key", "val", -Application.Number, -Decision)
Application.Number Decision key val
1 80749 Invalid date_generated 1521810060
2 80749 Invalid date_generated 1521810060
3 80749 Invalid date_generated 1523523840
4 80749 Invalid date_generated 1523536500
5 80749 Invalid date_generated 1524036720
6 80749 Invalid date_generated 1524136380
7 80749 Approved date_generated 1524137460
8 80749 Approved date_generated 1524137460
9 80749 Approved date_generated 1524137460
10 80749 Invalid date_decided 1522155960
11 80749 Invalid date_decided 1522155660
12 80749 Invalid date_decided 1523534400
13 80749 Invalid date_decided 1523600520
14 80749 Invalid date_decided 1524127140
15 80749 Invalid date_decided 1524136740
16 80749 Approved date_decided 1524211800
17 80749 Approved date_decided 1524211740
18 80749 Approved date_decided 1524211200
19 80749 Invalid time_to_decision 4.00347222222222
20 80749 Invalid time_to_decision 4
21 80749 Invalid time_to_decision 0.122222222222222
22 80749 Invalid time_to_decision 0.740972222222222
23 80749 Invalid time_to_decision 1.04652777777778
24 80749 Invalid time_to_decision 0.00416666666666667
25 80749 Approved time_to_decision 0.860416666666667
26 80749 Approved time_to_decision 0.859722222222222
27 80749 Approved time_to_decision 0.853472222222222
28 80749 Invalid text rIUQRmOkyZ
29 80749 Invalid text ZxdYUr16NR
30 80749 Invalid text 8IIipoleOV
31 80749 Invalid text nLuIgToxcT
32 80749 Invalid text xYFksrws87
33 80749 Invalid text N2oECMtgQo
34 80749 Approved text RKcrBcBFI2
35 80749 Approved text jaH438byVt
36 80749 Approved text 80ggA2hZr7
Warning:
attributes are not identical across measure variables;
they will be dropped
The warning already is a hint: You have converted all your value columns data_generated
, date_decided
, time_to_decision
and text
to the most general data format which can hold all these values: Character strings. See how your dates were converted to seconds since the epoch: You lost for example time zone information.
So in short, you can do it but I don't think you should. Because you did not show your use case or any context, I can't propose a better solution though.,
Upvotes: 1