Reputation: 29
I have one dataset and I would like to transpose these grouped rows into columns and calculate distance as well.
Let´s say I have a dataset like below Where "order" column is only related within groups and there are many kinds of mod and one group use only one mod. And of course dataframe has more rows with groups.
Group1 | order | Area_id | start_date | end_date | mod | Area_name | lat | long |
---|---|---|---|---|---|---|---|---|
A | 1 | 0001 | 2022-01-01 | 2022-01-02 | trk | California | 36.778261 | -119.4179324 |
A | 2 | 0002 | 2022-01-04 | 2022-01-07 | trk | Californib | 36.778262 | -119.4179325 |
A | 3 | 0003 | 2022-01-10 | 2022-01-12 | trk | Californic | 36.778264 | -119.4179329 |
A | 4 | 0004 | 2022-01-16 | 2022-01-20 | trk | Californid | 36.778265 | -119.4179330 |
B | 1 | 0012 | 2022-02-11 | 2022-02-12 | bus | Barcelona | 41.385063 | 2.1734036 |
B | 2 | 0013 | 2022-02-14 | 2022-02-18 | bus | Barcelonb | 41.385064 | 2.1734037 |
C | 1 | 0020 | 2022-01-27 | 2022-01-29 | car | oaklanda | 37.8043237 | -122.2711133 |
C | 2 | 0012 | 2022-02-02 | 2022-02-04 | car | Barcelona | 41.385063 | 2.1734036 |
C | 3 | 0009 | 2022-02-07 | 2022-02-10 | car | oaklandc | 37.8043637 | -122.2711237 |
... I would like to convert these dataframe like below. Where "Distance" columns are distance between "From_area" and "To_area". Could you please help?
Group | From_area_id | To_area_id | From_area | To_area | start_date | end_date | mod | Distance |
---|---|---|---|---|---|---|---|---|
A | 0001 | 0002 | California | Californib | 2022-01-02 | 2022-01-04 | trk | |
A | 0002 | 0003 | Californib | Californic | 2022-01-07 | 2022-01-10 | trk | |
A | 0003 | 0004 | Californic | Californid | 2022-01-12 | 2022-01-16 | trk | |
B | 0012 | 0013 | Barcelona | Barcelonb | 2022-02-12 | 2022-02-14 | bus | |
C | 0020 | 0012 | oaklanda | Barcelona | 2022-01-29 | 2022-02-02 | car | |
C | 0012 | 0009 | Barcelona | oaklandc | 2022-02-04 | 2022-02-07 | car |
Upvotes: 1
Views: 169
Reputation: 78927
Update after clarification: See comments:
library(dplyr)
library(geosphere)
df %>%
group_by(Group1) %>%
mutate(end_date1 = lead(start_date)) %>%
mutate(start_date = end_date, .keep="unused") %>%
mutate(Area_id = str_pad(as.character(Area_id), 4, pad="0")) %>%
mutate(To_area_id = lead(Area_id), .after="Area_id") %>%
mutate(To_area = lead(Area_name), .after="Area_name") %>%
mutate(across(c(lat, long), ~lead(.), .names = "To_{.col}")) %>%
na.omit() %>%
rename(end_date = end_date1, From_area_id = Area_id, From_area = Area_name, From_lat=lat, From_long = long) %>%
rowwise() %>%
mutate(Distance = distHaversine(c(From_long, From_lat),
c(To_long, To_lat)), .keep="unused") %>%
select(Group = Group1, From_area_id, To_area_id,
From_area, To_area, start_date, end_date, mod, Distance)
Group From_area_id To_area_id From_area To_area start_date end_date mod Distance
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 A 0001 0002 California Californib 2022-01-02 2022-01-04 trk 0.112
2 A 0002 0003 Californib Californic 2022-01-07 2022-01-10 trk 0.225
3 A 0003 0004 Californic Californid 2022-01-12 2022-01-16 trk 0.112
4 B 0012 0013 Barcelona Barcelonb 2022-02-12 2022-02-14 bus 0.112
5 C 0020 0012 oaklanda Barcelona 2022-01-29 2022-02-02 car 9572260.
6 C 0012 0009 Barcelona oaklandc 2022-02-04 2022-02-07 car 9572257.
First answer:
Something like this:
library(dplyr)
library(geosphere)
df %>%
group_by(Group1) %>%
mutate(Area_id = str_pad(as.character(Area_id), 4, pad="0")) %>%
mutate(To_area_id = lead(Area_id), .after="Area_id") %>%
mutate(To_area = lead(Area_name), .after="Area_name") %>%
mutate(across(c(lat, long), ~lead(.), .names = "To_{.col}")) %>%
na.omit() %>%
rename(From_area_id = Area_id, From_area = Area_name, From_lat=lat, From_long = long) %>%
rowwise() %>%
mutate(Distance = distHaversine(c(From_long, From_lat),
c(To_long, To_lat)), .keep="unused")
Group1 order From_area_id To_area_id start_date end_date mod From_area To_area Distance
<chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 A 1 0001 0002 2022-01-01 2022-01-02 trk California Californib 0.112
2 A 2 0002 0003 2022-01-04 2022-01-07 trk Californib Californic 0.225
3 A 3 0003 0004 2022-01-10 2022-01-12 trk Californic Californid 0.112
4 B 1 0012 0013 2022-02-11 2022-02-12 bus Barcelona Barcelonb 0.112
5 C 1 0020 0012 2022-01-27 2022-01-29 car oaklanda Barcelona 9572260.
6 C 2 0012 0009 2022-02-02 2022-02-04 car Barcelona oaklandc 9572257.
Upvotes: 0