04 April, 2012

Split Large Xml file in smaller valid Xml Files using LinqToXml

Splitting a large xml file using Linq to Xml.

The main performance hit is the initial load of the xml to memory (must find a solution to load each node as needed). There is certaintly room for improvement but for now, its gets the job done.

string path = @"c:\large.xml";
int nrParts = 220;

XElement root = XElement.Load(path);
XElement cleanedRoot = new XElement(root);
cleanedRoot.RemoveAll();
XNode[] children = root.Nodes().ToArray();
int childrenCount = children.Count();

int nodeCountPerPart = (int)Math.Ceiling(childrenCount / (double)nrParts);
int totalNodesToAdd = childrenCount;
int indexBase = 0;

for (int i = 0; i < nrParts; i++)
{
    XElement newRoot = new XElement(cleanedRoot);
    indexBase = i * nodeCountPerPart;
    for (int j = 0; j < Math.Min(nodeCountPerPart, totalNodesToAdd) ; j++)
    {
        newRoot.Add(children[indexBase + j]);
    }

    newRoot.Save(string.Format(@"c:\large_part{0}.xml", i+1));
    totalNodesToAdd -= nodeCountPerPart;
}


UPDATEUsing this method +Master Aucrun  has created a simple WinForms project to easily split an xml file. Thanks +Master Aucrun. You can get it here.