King Frazier
King Frazier

Reputation: 343

sql update data from r dataframe not insert

I have a dataframe (df) with three columns (a,b,c)

I am inserting these values into SQL database using

df <- data.frame(a=1:10, b=10:1, c=11:20)
values <- paste("(",df$a,",", df$b,",",df$c,")", sep="", collapse=",")
cmd <- paste("insert into MyTable values ", values)
result <- sqlQuery(con, cmd, as.is=TRUE) 

Source: How to insert a dataframe into a SQL Server table?

My question is what is the update equivalent of this example ? How can i use

cmd <- paste("update MyTable values ", values)
result <- sqlQuery(con, cmd, as.is=TRUE) 

I tried sqlSave and sqlUpdate but it got sticky too soon with error messages like missing columns, index etc...so I am falling back to this example above and trying to learn how to use update statements based on that toy dataset. Any help is much appreciated folks.

Upvotes: 0

Views: 3869

Answers (3)

StatsStudent
StatsStudent

Reputation: 1594

I realize this post is several years old, but if anyone stumbles upon this now, I figured I'd just write a better answer. The method the OP was trying to use is quite painful and very inefficient. You can simply use the dbWriteTable commands. You can insert your data directly from a dataframe into a new table that doesn't previously exist or you can append an existing table by setting the append = TRUE option. See help(dbWriteTable) for additional options as well such as specifying column data types. Note, in the example below I have defined the SQL Server name as well as the database name, but I did not show their values here, for obvious security reasons. I've included the command and corresponding output.

> #Connect to Azure SQL Server database
> con <- DBI::dbConnect(odbc::odbc(), 
+                       Driver = "ODBC Driver 17 for SQL Server", 
+                       Server = SQL_SERVER_NAME,
+                       Database = SQL_DATABASE_NAME,
+                       Authentication = "ActiveDirectoryIntegrated")
> 
> 
> # Setup example, create two dataframes
> # df1:  dataframe to insert into new SQL table that doesn't exist yet
> # df2:  dataframe for appending
> df1 <- data.frame(a=1:10, b=10:1, c=11:20)
> df2 <- data.frame(a=31:30, b=30:31, c=41:50)
> 
> # dbWriteTable sends your df1 dataframe to SQL Server and builds the table
> DBI::dbWriteTable(conn = con, 
+                   name = "my_table_in_sql_server", 
+                   value = df1
+                   )
> 
> # If the table already exists, you can append it using the append = TRUE argument
> DBI::dbWriteTable(conn = con, 
+                   name = "my_table_in_sql_server", 
+                   value = df2, 
+                   append = TRUE
+ )
> 
> # You can verify your data was inserted now by simply querying the table 
> DBI::dbGetQuery(conn = con, statement = "select * from my_table_in_sql_server")
    a  b  c
1   1 10 11
2   2  9 12
3   3  8 13
4   4  7 14
5   5  6 15
6   6  5 16
7   7  4 17
8   8  3 18
9   9  2 19
10 10  1 20
11 31 30 41
12 30 31 42
13 31 30 43
14 30 31 44
15 31 30 45
16 30 31 46
17 31 30 47
18 30 31 48
19 31 30 49
20 30 31 50

Upvotes: 0

RDRR
RDRR

Reputation: 880

I know this question was posted over 4 years ago but I hope this will help out other userRs who are searching for an answer to this.

table <- [NAME OF THE TABLE YOU WANT TO UPDATE]
x <- [YOUR DATA SET]

# We'll need the column names of the table for our INSERT/UPDATE statement
rs <- dbSendQuery(con, paste0('SHOW COLUMNS FROM ', table, ';'))
col_names <- dbFetch(rs)
dbClearResult(rs)

# Find which columns are primary keys as we shouldn't need to update these
pri <- which(col_names$Key == "PRI")

# For each row in your data frame, build an UPDATE statement and query your db
for(i in 1:nrow(x)) {

    # Transform ith row of dataset into character vector
    values <- sapply(x[i, ], as.character)

    # Build the INSERT/UPDATE query
    myquery <- paste0("INSERT INTO ",
                      table,
                      "(", paste(col_names$Field, collapse = ", "), ") ", # column names   
                      "VALUES",
                      "('", paste(values, collapse = "', '"), "') ", # new records
                      "ON DUPLICATE KEY UPDATE ",
                      paste(col_names$Field[-pri], values[-pri], sep = " = '", collapse = "', "), # everything minus primary keys
                      "';")

    # Show full query for clarity
    cat("Performing query", i, "of", nrow(x), ":\n", myquery, "\n\n")

    # Send query to database
    dbSendQuery(con, myquery)

}

I just posted my solution to this problem on GitHub if you're looking for a more guided walkthrough.

Upvotes: 1

Shawn Mehan
Shawn Mehan

Reputation: 4568

So, firstly you have the wrong syntax for UPDATE. In general,

UPDATE table_name
SET column1 = value1, column2 = value2...., columnN = valueN
WHERE [condition];

so you can't build up the values as a concatenated vector as you have done. If you don't select a particular element with the WHERE you will update the value value1 across all the values in column1.

EDIT: If you can't match the condition, then you aren't actually updating, you are inserting, which has the forms:

It is possible to write the INSERT INTO statement in two forms.

The first form does not specify the column names where the data will be inserted, only their values:

INSERT INTO table_name
VALUES (value1,value2,value3,...);

The second form specifies both the column names and the values to be inserted:

INSERT INTO table_name (column1,column2,column3,...)
VALUES (value1,value2,value3,...);

If you want to do anything more complicated, you will need to build up the query with SQL, probably in something other than R first, at least to learn. A possibility for experimentation could be SQL fiddle if you aren't comfortable with SQL yet.

Upvotes: 3

Related Questions