hummeale
hummeale

Reputation: 1

Is it possible to transform a SQL Code to a R Code?

I got following problem: I got a Database and have to evaluate some Deltas e.g. I already computed this with SQL and it works well but now I need to try it with R and I didnt work much with R so I'm not quite sure if my problems are solvable with R. The best case would be if i can transform my sql code to a R code. Otherwise i will present my problem to you:

+----+------+------------+
| a  |  b   |     c      |
+----+------+------------+
| 10 |    2 | 31.01.2020 |
| 11 |    5 | 31.01.2020 |
| 10 |    3 | 31.01.2020 |
| 10 |    2 | 31.12.2019 |
| 11 |    4 | 31.12.2019 |
| 10 |   2  | 31.12.2019 |
+----+------+------------+

dput:

structure(list(a = c(10L, 11L, 10L, 10L, 11L, 10L), b = c(2L, 
5L, 3L, 2L, 4L, 2L), c = c("31.01.2020", "31.01.2020", "31.01.2020", 
"31.12.2019", "31.12.2019", "31.12.2019")), row.names = c(NA, 
-6L), class = "data.frame")

delta

+----+---+
| 10 | 1 |
| 11 | 1 |
+----+---+

It's my first time in this community so I hope you can see my picture. It's an example for my problem:

In the first step i need to summarize column b for the same date (c) and the same number in a). So the result for 31.12.2020 would be : 10= 5 and 11= 5 and the same for the 31.12.2019.

Until this point it works but in the next step I need to calculate an the delta of b) s.t. column a) and c)

In other words: For the number 10 in a) i calculated 5 in 31.01.2020 in 31.12.2019 i calculated 4 for 10. Now i have to calculate 5-4, but it's not possible for me to program it.

So i hope some of you understand my problem and can help me to find a smart way i could solve this problem.

I would be very thankful if you can help me.

Upvotes: 0

Views: 75

Answers (3)

IceCreamToucan
IceCreamToucan

Reputation: 28675

You can do this with data.table. First convert c to a date column.

Then take the sum of b by each (a, c) group, then within the a groups of the resulting table, take the diff of this sum. The diff will be in the correct direction because keyby orders the results by the grouping columns.

library(data.table)
setDT(df)
df[, c := as.Date(c, format = '%d.%m.%Y')]

df[, .(bsum = sum(b)), keyby = .(a, c)
  ][, .(bsum_diff = diff(bsum)), by = a]
#     a bsum_diff
# 1: 10         1
# 2: 11         1

Upvotes: 1

user10917479
user10917479

Reputation:

Using the package dplyr, try something like this:

library(dplyr)

df %>%
  mutate(c_date = as.Date(c, format = "%d.%m.%Y")) %>% # convert to a Date in R
  group_by(a, c_date) %>% 
  summarize(b = sum(b)) %>% # sum grouped by a and c
  group_by(a) %>%
  arrange(c_date) %>% # sort on the date
  summarize(delta = diff(b)) # take the difference, grouped on a

Which will give:

# A tibble: 2 x 2
      a delta
  <int> <int>
1    10     1
2    11     1

Upvotes: 1

David M
David M

Reputation: 33

I don't know if is possible translate SQL to R code, but working with databases via R is not so hard. In a big picture: You need to set a SQL connection: DBI::dbconnect(). Next, with the function tbl() and the package dplyr you start to replicate a sql select statement. Here you can find more information: https://db.rstudio.com

Upvotes: 0

Related Questions