Learning
Learning

Reputation: 29

transpose grouped rows into from-to columns in r

I have one dataset and I would like to transpose these grouped rows into columns and calculate distance as well.

Let´s say I have a dataset like below Where "order" column is only related within groups and there are many kinds of mod and one group use only one mod. And of course dataframe has more rows with groups.

Group1 order Area_id start_date end_date mod Area_name lat long
A 1 0001 2022-01-01 2022-01-02 trk California 36.778261 -119.4179324
A 2 0002 2022-01-04 2022-01-07 trk Californib 36.778262 -119.4179325
A 3 0003 2022-01-10 2022-01-12 trk Californic 36.778264 -119.4179329
A 4 0004 2022-01-16 2022-01-20 trk Californid 36.778265 -119.4179330
B 1 0012 2022-02-11 2022-02-12 bus Barcelona 41.385063 2.1734036
B 2 0013 2022-02-14 2022-02-18 bus Barcelonb 41.385064 2.1734037
C 1 0020 2022-01-27 2022-01-29 car oaklanda 37.8043237 -122.2711133
C 2 0012 2022-02-02 2022-02-04 car Barcelona 41.385063 2.1734036
C 3 0009 2022-02-07 2022-02-10 car oaklandc 37.8043637 -122.2711237

... I would like to convert these dataframe like below. Where "Distance" columns are distance between "From_area" and "To_area". Could you please help?

Group From_area_id To_area_id From_area To_area start_date end_date mod Distance
A 0001 0002 California Californib 2022-01-02 2022-01-04 trk
A 0002 0003 Californib Californic 2022-01-07 2022-01-10 trk
A 0003 0004 Californic Californid 2022-01-12 2022-01-16 trk
B 0012 0013 Barcelona Barcelonb 2022-02-12 2022-02-14 bus
C 0020 0012 oaklanda Barcelona 2022-01-29 2022-02-02 car
C 0012 0009 Barcelona oaklandc 2022-02-04 2022-02-07 car

Upvotes: 1

Views: 169

Answers (1)

TarJae
TarJae

Reputation: 78927

Update after clarification: See comments:

library(dplyr)
library(geosphere)

df %>% 
  group_by(Group1) %>% 
  mutate(end_date1 = lead(start_date)) %>% 
  mutate(start_date = end_date, .keep="unused") %>% 
  mutate(Area_id = str_pad(as.character(Area_id), 4, pad="0")) %>%
  mutate(To_area_id = lead(Area_id), .after="Area_id") %>% 
  mutate(To_area = lead(Area_name), .after="Area_name") %>% 
  mutate(across(c(lat, long), ~lead(.), .names = "To_{.col}")) %>% 
  na.omit() %>% 
  rename(end_date = end_date1, From_area_id = Area_id, From_area = Area_name, From_lat=lat, From_long = long) %>% 
  rowwise() %>% 
  mutate(Distance = distHaversine(c(From_long, From_lat), 
                                  c(To_long, To_lat)), .keep="unused") %>% 
  select(Group = Group1, From_area_id, To_area_id,
         From_area, To_area, start_date, end_date, mod, Distance)
  Group From_area_id To_area_id From_area  To_area    start_date end_date   mod      Distance
  <chr> <chr>        <chr>      <chr>      <chr>      <chr>      <chr>      <chr>       <dbl>
1 A     0001         0002       California Californib 2022-01-02 2022-01-04 trk         0.112
2 A     0002         0003       Californib Californic 2022-01-07 2022-01-10 trk         0.225
3 A     0003         0004       Californic Californid 2022-01-12 2022-01-16 trk         0.112
4 B     0012         0013       Barcelona  Barcelonb  2022-02-12 2022-02-14 bus         0.112
5 C     0020         0012       oaklanda   Barcelona  2022-01-29 2022-02-02 car   9572260.   
6 C     0012         0009       Barcelona  oaklandc   2022-02-04 2022-02-07 car   9572257.   

First answer:

Something like this:

library(dplyr)

library(geosphere)

df %>% 
  group_by(Group1) %>% 
  mutate(Area_id = str_pad(as.character(Area_id), 4, pad="0")) %>%
  mutate(To_area_id = lead(Area_id), .after="Area_id") %>% 
  mutate(To_area = lead(Area_name), .after="Area_name") %>% 
  mutate(across(c(lat, long), ~lead(.), .names = "To_{.col}")) %>% 
  na.omit() %>% 
  rename(From_area_id = Area_id, From_area = Area_name, From_lat=lat, From_long = long) %>% 
  rowwise() %>% 
  mutate(Distance = distHaversine(c(From_long, From_lat), 
                                  c(To_long, To_lat)), .keep="unused")
  Group1 order From_area_id To_area_id start_date end_date   mod   From_area  To_area       Distance
  <chr>  <int> <chr>        <chr>      <chr>      <chr>      <chr> <chr>      <chr>            <dbl>
1 A          1 0001         0002       2022-01-01 2022-01-02 trk   California Californib       0.112
2 A          2 0002         0003       2022-01-04 2022-01-07 trk   Californib Californic       0.225
3 A          3 0003         0004       2022-01-10 2022-01-12 trk   Californic Californid       0.112
4 B          1 0012         0013       2022-02-11 2022-02-12 bus   Barcelona  Barcelonb        0.112
5 C          1 0020         0012       2022-01-27 2022-01-29 car   oaklanda   Barcelona  9572260.   
6 C          2 0012         0009       2022-02-02 2022-02-04 car   Barcelona  oaklandc   9572257.  

Upvotes: 0

Related Questions