Reputation: 1
I got following problem: I got a Database and have to evaluate some Deltas e.g. I already computed this with SQL and it works well but now I need to try it with R and I didnt work much with R so I'm not quite sure if my problems are solvable with R. The best case would be if i can transform my sql code to a R code. Otherwise i will present my problem to you:
+----+------+------------+
| a | b | c |
+----+------+------------+
| 10 | 2 | 31.01.2020 |
| 11 | 5 | 31.01.2020 |
| 10 | 3 | 31.01.2020 |
| 10 | 2 | 31.12.2019 |
| 11 | 4 | 31.12.2019 |
| 10 | 2 | 31.12.2019 |
+----+------+------------+
dput:
structure(list(a = c(10L, 11L, 10L, 10L, 11L, 10L), b = c(2L,
5L, 3L, 2L, 4L, 2L), c = c("31.01.2020", "31.01.2020", "31.01.2020",
"31.12.2019", "31.12.2019", "31.12.2019")), row.names = c(NA,
-6L), class = "data.frame")
delta
+----+---+
| 10 | 1 |
| 11 | 1 |
+----+---+
It's my first time in this community so I hope you can see my picture. It's an example for my problem:
In the first step i need to summarize column b for the same date (c) and the same number in a). So the result for 31.12.2020 would be : 10= 5 and 11= 5 and the same for the 31.12.2019.
Until this point it works but in the next step I need to calculate an the delta of b) s.t. column a) and c)
In other words: For the number 10 in a) i calculated 5 in 31.01.2020 in 31.12.2019 i calculated 4 for 10. Now i have to calculate 5-4, but it's not possible for me to program it.
So i hope some of you understand my problem and can help me to find a smart way i could solve this problem.
I would be very thankful if you can help me.
Upvotes: 0
Views: 75
Reputation: 28675
You can do this with data.table. First convert c
to a date column.
Then take the sum of b
by each (a, c)
group, then within the a
groups of the resulting table, take the diff
of this sum. The diff
will be in the correct direction because keyby
orders the results by the grouping columns.
library(data.table)
setDT(df)
df[, c := as.Date(c, format = '%d.%m.%Y')]
df[, .(bsum = sum(b)), keyby = .(a, c)
][, .(bsum_diff = diff(bsum)), by = a]
# a bsum_diff
# 1: 10 1
# 2: 11 1
Upvotes: 1
Reputation:
Using the package dplyr
, try something like this:
library(dplyr)
df %>%
mutate(c_date = as.Date(c, format = "%d.%m.%Y")) %>% # convert to a Date in R
group_by(a, c_date) %>%
summarize(b = sum(b)) %>% # sum grouped by a and c
group_by(a) %>%
arrange(c_date) %>% # sort on the date
summarize(delta = diff(b)) # take the difference, grouped on a
Which will give:
# A tibble: 2 x 2
a delta
<int> <int>
1 10 1
2 11 1
Upvotes: 1
Reputation: 33
I don't know if is possible translate SQL to R code, but working with databases via R is not so hard. In a big picture: You need to set a SQL connection: DBI::dbconnect(). Next, with the function tbl() and the package dplyr you start to replicate a sql select statement. Here you can find more information: https://db.rstudio.com
Upvotes: 0