04 April, 2012

Split Large Xml file in smaller valid Xml Files using LinqToXml

Splitting a large xml file using Linq to Xml.

The main performance hit is the initial load of the xml to memory (must find a solution to load each node as needed). There is certaintly room for improvement but for now, its gets the job done.

string path = @"c:\large.xml";
int nrParts = 220;

XElement root = XElement.Load(path);
XElement cleanedRoot = new XElement(root);
cleanedRoot.RemoveAll();
XNode[] children = root.Nodes().ToArray();
int childrenCount = children.Count();

int nodeCountPerPart = (int)Math.Ceiling(childrenCount / (double)nrParts);
int totalNodesToAdd = childrenCount;
int indexBase = 0;

for (int i = 0; i < nrParts; i++)
{
    XElement newRoot = new XElement(cleanedRoot);
    indexBase = i * nodeCountPerPart;
    for (int j = 0; j < Math.Min(nodeCountPerPart, totalNodesToAdd) ; j++)
    {
        newRoot.Add(children[indexBase + j]);
    }

    newRoot.Save(string.Format(@"c:\large_part{0}.xml", i+1));
    totalNodesToAdd -= nodeCountPerPart;
}


UPDATEUsing this method +Master Aucrun  has created a simple WinForms project to easily split an xml file. Thanks +Master Aucrun. You can get it here.

22 March, 2012

WP7 DatabaseSchemaUpdater multiple version update

While testing the Windows Phone 7 DatabaseSchemaUpdater I started getting the following error: "A column ID occurred more than once in the specification."

I searched the web for a while and got nowhere.

So, analizing my code for the 10th time i noticed something. On the second update (version 1 to version 2) I was adding a new table:

schemaUpdater.AddTable();

And on the third update (v2 to v3) I was adding a new column to that same table:

schemaUpdater.AddColumn< ClassName >("PropertyName");

If the user updates the app regularly everything goes smoothly, but if the user doesn't update for some time and then has to go through all the updates the app crashes.

The problem is that when the new table was added, since the user is already at the newest version, the table already has the column that the next update will try to add. At that point it will crash, and continue crashing at every update unless the DatabaseSchemaVersion is properly updated.

This can lead to a lot of frustration and data loss for the user and must be taken into account when updating.

Since I couldn't find a way to get the current schema to check for table columns before adding them I made a simple workaround.

When updating keep a list of added table names in that update and when adding columns only add them if the table where the column is being added doesn't have its name on the list.
Check the example below



private const int LatestDatabaseVersion = 2;

private List tablesAddedInThisUpdateSession = null;
public List TablesAddedInThisUpdateSession
{
get
{
if (tablesAddedInThisUpdateSession == null)
{
tablesAddedInThisUpdateSession = new List();
}

return tablesAddedInThisUpdateSession;
}
}

public void UpdateIfNeeded()
{
// create an instance of DatabaseSchemaUpdater
DatabaseSchemaUpdater schemaUpdater = this.CreateDatabaseSchemaUpdater();

int version = schemaUpdater.DatabaseSchemaVersion;

// if current version of database schema is old
while (version < LatestDatabaseVersion)
{
switch (version)
{
case 0:
version = UpdateFromVersion0ToVersion1();
break;
case 1:
version = UpdateFromVersion1ToVersion2();
break;
default:
break;
}
}
}

private int UpdateFromVersion0ToVersion1()
{
DatabaseSchemaUpdater schemaUpdater = this.CreateDatabaseSchemaUpdater();

schemaUpdater.AddTable<Contact>();
TablesAddedInThisUpdateSession.Add("Contact");

// IMPORTANT: update database schema version before calling Execute
schemaUpdater.DatabaseSchemaVersion = 1;
schemaUpdater.Execute();

return schemaUpdater.DatabaseSchemaVersion;
}

private int UpdateFromVersion1ToVersion2()
{
DatabaseSchemaUpdater schemaUpdater = this.CreateDatabaseSchemaUpdater();

if (!TablesAddedInThisUpdateSession.Contains("Contact"))
{
schemaUpdater.AddColumn<Contact>("Phone");
}

schemaUpdater.DatabaseSchemaVersion = 2;
schemaUpdater.Execute();

return schemaUpdater.DatabaseSchemaVersion;
}

06 March, 2012

Asp.NET Concurrent AJAX Calls

In Asp.NET "if two concurrent requests are made for the same session (by using the same SessionID value), the first request gets exclusive access to the session information. The second request executes only after the first request is finished." (ASP.NET Session State Overview)

The keywords here are "same session".

When an Ajax call to an ASP.NET Web method is made:

[WebMethod(true)] // EnableSession = true
public static void MethodName()
{
}

If the "EnableSession" is set to true there can be only one ajax call at a time. Further calls will wait in pending state until the call is finished. Once it's finished the order of previous calls is not guaranteed.

If EnableSession is set to false you can make concurrent calls but will not have access to the Session. The number of calls that can be made depends on each browser.

However, if you're user ASP.NET MVC 3 and above there is an attribute named SessionState that enables you to make concurrent calls and still access your session (in read-only mode)

Concurrent calls :
[SessionState(System.Web.SessionState.SessionStateBehavior.ReadOnly)]

Sequencial calls:
[SessionState(System.Web.SessionState.SessionStateBehavior.Required)]

21 February, 2012

WCF IEnumerable DataContracts bug

Kept getting an error while using SoapUI to create valid requests for a WCF web service hosted in IIS.
While debugging the operation executed without any problems, but the client always got a connection error (java.net.SocketException: Connection reset).
Turns out there is a bug in WCF when serializing a DataContract with an IEnumerable<T> member. Changed to List<T> and the problem was gone.

19 February, 2012

Solution file (.sln) comments (#)

When to adding comments to Solution files start the comment line with the # char.

Example:

# Comment

17 February, 2012

WCF service contract changes supported by old proxy

Needed to know which changes to a WCF Service Contract could an old proxy support. The tests were made with BasicHttpBinding and WsHttpBinding and a client with a proxy generated by visual studio. All DataContract attributes were strings.

My results were:

Add an Operation - OK
Remove an Operation - OK unless the client calls it.


Added Attribute to response object - OK
Added Attribute to request object - OK

Removed Attribute from response object - OK (the client interprets it as null)
Removed Attribute from request object - OK (the client interprets it as null)

Change the type of an Attribute of the response object -depends whether the new type can be parsed to the old type from the message string.
Change the type of an Attribute of the request object -depends whether the new type can be parsed to the old type from the message string.

Change the Attribute name in the response object - OK (the client gets null since the old attribute was removed)
Change the Attribute name in the request object - OK (the client gets null since the old attribute was removed)

All OK results indicate that no Exception is thrown.
The client will not be aware of new properties or new operations but will also not crash (at least for nullable properties, haven't tested types other than strings).

The results were the same for both bindings.

Other bindings, especially ones that have binary serialization will probably present different behaviors.

10 July, 2010

Comparing collections with LINQ

Once in a while i have to call the Distinct LINQ method and, more often than not, i don't want to compare the objects reference. To this effect Microsoft provides the interface IEqualityComparer<T>.
The IEqualityComparer<T> interface implements two methods: bool Equals(T x, T y) and int GetHashCode(T obj). These methods are both used by LINQ to compare objects in collections.
Since i need to compare most of the business objects in my solutions, and don't want to go through the hassle of implementing tens of new classes (one per object) I wanted a generic solution.
Simple enough, I built myself the GenericComparer<T>.

public class GenericComparer<T> : IEqualityComparer<T>
{
public GenericComparer(Func<T, T, bool> equals, Func<T, int> getHashCode)
{
this.equals = equals;
this.getHashCode = getHashCode;
}

readonly Func<T, T, bool> equals;
public bool Equals(T x, T y)
{
return equals(x, y);
}

readonly Func<T, int> getHashCode;
public int GetHashCode(T obj)
{
return getHashCode(obj);
}
}

This worked fine. I could finally just set my comparer directly in my LINQ command. All i had to give was the expression to be used in the Equals and GetHashCode methods.



IEnumerable<mytype> result = collection.Distinct(new GenericComparer<mytype>(((mt1, mt2) => mt1.id == mt2.id), (mt => mt.id.GetHashCode())));

Still, it was more code than I would like to see. Since all I want is to receive all unique elements based on a single expression then that's all i should have to write. A simple tweak to the GenericComparer class and I got a simpler comparer SimpleGenericComparer<T>.

public class SimpleGenericComparer<T> : IEqualityComparer<T>
{
public SimpleGenericComparer(Func<T, int> getHashCode)
{
this.getHashCode = getHashCode;
}

public bool Equals(T x, T y)
{
return getHashCode(x) == getHashCode(y);
}

readonly Func<T, int> getHashCode;
public int GetHashCode(T obj)
{
return getHashCode(obj);
}
}
Having the Equals method simply compare both objects with the GetHashCode expression i could finaly just give the expression with which the comparison will be made.

IEnumerable<mytype> result = collection.Distinct(new SimpleGenericComparer<mytype>(mt => mt.id.GetHashCode());