Reputation: 457
Is there anything terribly inefficient here? It seems like this process is taking way longer than it should. I am parsing many JSON files each with a JsonArray of objects. Maybe someone with more experience could point out an error in this method of parsing the JSON into objects, thereby saving me a ton of time.
Also, memory usage slowly creeps upwards MB by MB sometimes causing outofmemoryexceptions..
public void Parse(){
using (BabysFirstsUsersDataEntities db = new BabysFirstsUsersDataEntities()
{
foreach (var path in Directory.EnumerateFiles(@"C:\\examplepath\").OrderBy(f => f))
{
string jsonString = System.IO.File.ReadAllText(path);
JToken tok = JObject.Parse(jsonString);
Debug.WriteLine("path: " + path);
foreach (var x in tok.First.First)
{
JsonUserImageDTO jdto = x.ToObject<JsonUserImageDTO>();
UserImageList uil = jdto.ToDataModel();
if (uil.REID != null)
db.UserImageLists.Add(uil);
}
}
db.SaveChanges();
}
}
An example of what one of the JSON strings in each .json file looks like below. Note that there are around 1000 of these files and each can have thousands of such entries:
{
"results": [
{
"ACL": {
"asdf": {
"read": true,
"write": true
},
"role:admin": { "read": true }
},
"REID": "exampleID",
"createdAt": "datetime-string",
"email": "example",
"objectId": "example",
"updatedAt": "datetimestring",
"urlCount": 1,
"urlList": [ "exampleurl" ]
},
{
"ACL": {
"asdf": {
"read": true,
"write": true
},
"role:admin": { "read": true }
},
"REID": "exampleID",
"createdAt": "datetime-string",
"email": "example",
"objectId": "example",
"updatedAt": "datetimestring",
"urlCount": 1,
"urlList": [ "exampleurl" ]
}
]
}
Upvotes: 0
Views: 525
Reputation: 117284
Have you actually profiled your code? See Erik Lippert's performance rant: Use a profiler or other analysis tool to determine empirically where the bottleneck is before you start investigating alternatives. For instance, you actual performance problem may be somewhere in the BabysFirstsUsersDataEntities db
class.
That being said, my immediate reaction is that you have too many intermediate representations of your data, the construction, population and garbage collection of which all take time. These include:
jsonString
which may be large enough to go on the large object heap, and thus permanently impair the performance and memory use of your process.JToken tok
representation of your entire JSON hierarchy.JsonUserImageDTO
.What I would suggest is to eliminate as many of these intermediate representations as possible. As suggested in the documentation you should load directly from a stream rather that loading to a string and parsing that string.
You can also eliminate the JToken tok
by populating your data model directly. Let's say your BabysFirstsUsersDataEntities
looks like this (I'm just guessing here):
public class BabysFirstsUsersDataEntities
{
public BabysFirstsUsersDataEntities() { this.UserImageLists = new List<UserImageList>(); }
public List<UserImageList> UserImageLists { get; set; }
}
public class UserImageList
{
public string email { get; set; }
public List<string> urlList;
}
And your DTO model looks something like this model provided by http://json2csharp.com/:
public class RootObjectDTO
{
public ICollection<JsonUserImageDTO> results { get; set; }
}
public class JsonUserImageDTO
{
public ACL ACL { get; set; }
public string REID { get; set; }
public string createdAt { get; set; }
public string email { get; set; }
public string objectId { get; set; }
public string updatedAt { get; set; }
public int urlCount { get; set; }
public List<string> urlList { get; set; }
public UserImageList ToDataModel()
{
return new UserImageList { email = email, urlList = urlList };
}
}
public class Asdf
{
public bool read { get; set; }
public bool write { get; set; }
}
public class RoleAdmin
{
public bool read { get; set; }
}
public class ACL
{
public Asdf asdf { get; set; }
[JsonProperty("role:admin")]
public RoleAdmin RoleAdmin { get; set; }
}
Then create the following generic ConvertingCollection<TIn, TOut>
utility class:
public class ConvertingCollection<TIn, TOut> : BaseConvertingCollection<TIn, TOut, ICollection<TIn>>
{
readonly Func<TOut, TIn> toInner;
public ConvertingCollection(Func<ICollection<TIn>> getCollection, Func<TIn, TOut> toOuter, Func<TOut, TIn> toInner)
: base(getCollection, toOuter)
{
if (toInner == null)
throw new ArgumentNullException();
this.toInner = toInner;
}
protected TIn ToInner(TOut outer) { return toInner(outer); }
public override void Add(TOut item)
{
Collection.Add(ToInner(item));
}
public override void Clear()
{
Collection.Clear();
}
public override bool IsReadOnly { get { return Collection.IsReadOnly; } }
public override bool Remove(TOut item)
{
return Collection.Remove(ToInner(item));
}
public override bool Contains(TOut item)
{
return Collection.Contains(ToInner(item));
}
}
public abstract class BaseConvertingCollection<TIn, TOut, TCollection> : ICollection<TOut>
where TCollection : ICollection<TIn>
{
readonly Func<TCollection> getCollection;
readonly Func<TIn, TOut> toOuter;
public BaseConvertingCollection(Func<TCollection> getCollection, Func<TIn, TOut> toOuter)
{
if (getCollection == null || toOuter == null)
throw new ArgumentNullException();
this.getCollection = getCollection;
this.toOuter = toOuter;
}
protected TCollection Collection { get { return getCollection(); } }
protected TOut ToOuter(TIn inner) { return toOuter(inner); }
#region ICollection<TOut> Members
public abstract void Add(TOut item);
public abstract void Clear();
public virtual bool Contains(TOut item)
{
var comparer = EqualityComparer<TOut>.Default;
foreach (var member in Collection)
if (comparer.Equals(item, ToOuter(member)))
return true;
return false;
}
public void CopyTo(TOut[] array, int arrayIndex)
{
foreach (var item in this)
array[arrayIndex++] = item;
}
public int Count { get { return Collection.Count; } }
public abstract bool IsReadOnly { get; }
public abstract bool Remove(TOut item);
#endregion
#region IEnumerable<TOut> Members
public IEnumerator<TOut> GetEnumerator()
{
foreach (var item in Collection)
yield return ToOuter(item);
}
#endregion
#region IEnumerable Members
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
#endregion
}
You can now populate your db
directly as follows:
var rootDTO = new RootObjectDTO
{
results = new ConvertingCollection<UserImageList, JsonUserImageDTO>(() => db.UserImageLists, (x) => { throw new NotImplementedException(); }, (x) => x.ToDataModel())
};
using (var stream = File.Open(path, FileMode.Open))
using (var reader = new StreamReader(stream))
{
JsonSerializer.CreateDefault().Populate(reader, rootDTO);
}
By populating a preallocated rootDTO
and ConvertingCollection<UserImageList, JsonUserImageDTO>
, your db.UserImageLists
will get populated with the contents of the JSON with fewer intermediate representations.
Upvotes: 1
Reputation: 5199
It looks like there could be several places that could causing the slowness.
jdto
, then uil
)It may be worth profiling the code to find out exactly what part is taking longer than you'd expect. That said there are some things you can do to generally improve this code.
tok
. See the second example in the docs for how to use a stream. Actually, in your case you the same information in memory 4 times -- the string, tok
, jdto
, and uil
. Which brings me to the next point..EnumerateFiles()
. There is no sense in deserializing a file if you are not going to do anything with it.Upvotes: 1
Reputation: 141
You can create the objects and then deserialize them. Example:
JsonConvert.DeserializeObject<RootObject>(jsonString);
public class Asdf
{
public bool read { get; set; }
public bool write { get; set; }
}
public class RoleAdmin
{
public bool read { get; set; }
}
public class ACL
{
public Asdf asdf { get; set; }
public RoleAdmin { get; set; }
}
public class Result
{
public ACL ACL { get; set; }
public string REID { get; set; }
public string createdAt { get; set; }
public string email { get; set; }
public string objectId { get; set; }
public string updatedAt { get; set; }
public int urlCount { get; set; }
public List<string> urlList { get; set; }
}
public class RootObject
{
public List<Result> results { get; set; }
}
Upvotes: 0