Reputation: 31
I have a set of data tables in R that I am attempting to manipulate with SQL using the package sqldf
, but no luck getting the correct output so far.
The primary table has records which are enumerated by IDs, while the secondary table has the same IDs and a separate set of variables that I need to use to remove records from the primary table. The IDs are the only thing which link the observations in the two tables.
What I needed to have happen was for the records to be removed from the primary only if either of two requirements are met: if height='1ft'
or if weight='10kg'
in the secondary table, then all the records with the corresponding IDs of those observations should be deleted from the primary table.
The SQL queries I have attempted to use through sqldf either produce syntax errors or work incorrectly and remove every record in the primary table, because only the condition where the IDs are compared is being evaluated (the full set of IDs exists in both tables, so if only that clause is evaluated everything in the primary gets deleted).
Here is an example with two different queries to show what I was getting at:
## Setup the primary dataframe
ID = c(1, 2, 3)
etc1 = c("stuff", "stuff", "stuff")
etc2 = c("stuff", "stuff", "stuff")
prim = data.frame(ID, etc1, etc2)
## Setup the secondary
height = c("1ft", "2ft", "3ft")
weight = c("5kg", "10kg", "20kg")
sec = data.frame(ID, height, weight)
sqldf("DELETE FROM prim
WHERE prim.ID=sec.ID
AND (sec.height='1ft' OR sec.weight='10kg')")
This first query obviously does nothing because there is nothing in the query which references the secondary table
sqldf("DELETE FROM prim
WHERE EXISTS (SELECT * FROM sec WHERE sec.ID=prim.ID
AND (sec.height='1ft' OR sec.weight='10kg'))")
The second approach above is modified from a basic SQL tutorial but results in every record being removed from the primary table - apparently the AND statement is not getting evaluated, but I am not sure why. I have tried formatting this in several different ways. If it had worked correctly, ID 1 would be dropped because of a match on height, and ID 2 would have been dropped due to a match on weight, leaving only ID 3 present.
Update: I've also tried modifying the previous approach to include nesting the WHERE statements with no success. It still results in 0 results returned.
sqldf("DELETE FROM prim
WHERE EXISTS
(SELECT * FROM sec WHERE EXISTS
(SELECT * FROM sec WHERE sec.height='1ft' OR sec.weight='10kg')
AND sec.ID = prim.ID)")
Additional Update: I attempted to solve the problem using JOIN to first pull the identifying columns from the secondary table into the primary, and then deleting from there with a second command. This approach also for some reason deletes all the records in the primary after running the delete command (the third row should be preserved), and produces a further warning message
prim <- sqldf("SELECT ID, etc1, etc2, height, weight
FROM sec
JOIN prim USING(ID)")
sqldf("DELETE FROM prim
WHERE height='1ft' OR weight='10kg'")
Warning message:
In rsqlite_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Upvotes: 0
Views: 262
Reputation: 1
Simple solution:
prim <- sqldf("SELECT *
FROM prim
JOIN sec USING(ID)
WHERE sec.height != '1ft' AND sec.weight !='10kg'")
Upvotes: 0
Reputation: 31
Well I found an answer that took care of all the problems I was having with regards to getting output with no records in it.
It turns out most of the delete commands I tried above would have worked if I had included a command call to sqlite that would send back the remaining data after the delete is performed. I did not know it did not do this automatically.
The following example does not work:
prim <- sqldf("SELECT ID, etc1, etc2, height, weight
FROM sec
JOIN prim USING(ID)")
sqldf("DELETE FROM prim
WHERE height='1ft' OR weight='10kg'")
However, adding the second command in the concatenation produces the desired result
sqldf(c("DELETE FROM prim
WHERE height='1ft' OR weight='10kg'",
"SELECT * FROM prim"))
So for my full example the second query I ran produces the desired result by just appending the additional select onto the end
sqldf(c("DELETE FROM prim
WHERE EXISTS (SELECT * FROM sec WHERE sec.ID=prim.ID
AND (sec.height='1ft' OR sec.weight='10kg'))",
"SELECT * FROM prim"))
Upvotes: 1