skjcyber
skjcyber

Reputation: 5957

Fastest way to Read huge volume of data from DB

I have a table which contains around 500 Million records. I am reading the data from table and storing those in a Dictionary.

EDIT: I am loading the data into dictionary because these data needs to be compared with another volume of data coming from an indexing server.

My code is as below:

public static void GetDetailsFromDB()
{
    string sqlStr = "SELECT ID, Name ,Age, email ,DOB ,Address ,Affiliation ,Interest ,Homepage FROM Author WITH (NOLOCK) ORDER BY ID";
    SqlCommand cmd = new SqlCommand(sqlStr, _con);
    cmd.CommandTimeout = 0;

    using (SqlDataReader reader = cmd.ExecuteReader())
    {
        while (reader.Read())
        {
            //Author Class
            Author author = new Author();

            author.id = Convert.ToInt32(reader["ID"].ToString());
            author.Name = reader["Name"].ToString().Trim();
            author.age = Convert.ToInt32(reader["Age"].ToString());
            author.email = reader["email"].ToString().Trim();
            author.DOB = reader["DOB"].ToString().Trim();
            author.Address = reader["Address"].ToString().Trim();
            author.Affiliation = reader["Affiliation"].ToString().Trim();
            author.Homepage = reader["Homepage"].ToString().Trim();

            string interests = reader["Interest"].ToString().Trim();
            author.interest = interests.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries).Select(p => p.Trim()).ToList();

            if (!AuthorDict.ContainsKey(author.id))
            {
                AuthorDict.Add(author.id, author);
            }

            if (AuthorDict.Count % 1000000 == 0)
            {
                Console.WriteLine("{0}M author loaded.", AuthorDict.Count / 1000000);
            }
        }
    }
}

This process is taking long time to read and store all 500 Million records from DB. Also, the RAM usage is very high.

Can this be optimized ? also, can the running time be decreased ? any help is appreciated.

Upvotes: 0

Views: 2615

Answers (1)

Daniel Kelley
Daniel Kelley

Reputation: 7737

If I hold my nose I can come up with the following optimisations:

  1. Store the ordinal positions of your fields in local variables and reference the fields in your reader using these ordinal variables.

  2. Do not call ToString on the reader and convert - just get the value out in the correct type.

  3. Check for the existence of the author id in the AuthorDict as soon as you have the id. Don't even create the Author instance if you don't need it.

    using (SqlDataReader reader = cmd.ExecuteReader())
    {
        var idOrdinal = reader.GetOrdinal("ID");
        //extract other ordinal positions and store here
    
        while (reader.Read())
        {
            var id = reader.GetInt32(idOrdinal);
    
            if (!AuthorDict.ContainsKey(id))
            {
                Author author = new Author();
                author.id = reader.GetInt32(idOrdinal);
                ...
            }
        }
    }
    

Upvotes: 3

Related Questions