jump to navigation

Getting XML data March 9, 2009

Posted by svenm in Notes/Domino.
trackback

I am working on a project that imports data from an XLM feed.

The idea is very easy: I am given an xml file, I import this file and afterwards I get a new file once a week in order to maintain the original data.

How does this work? The file contains person data and each person tag indicates what I have to do: add, delete or change

  • <Person id=”10183″ action=”add” date=”18-Sep-2008″>
  • <Person id=”11324″ action=”chg” date=”25-Dec-2008″>
  • <Person id=”588334″ action=”del” date=”26-Dec-2008″>

So far, so good. The only downside of this story is that the original file (the one that I have to start from) is 626Mb. Believe me : parsing this file is very time consuming. I managed to split the file into 2 separate xml files and afther the first import (almost 315Mb) my Notes database already contained about 2.000.000 documents and was a litte over 2Gb. I’m now importing the second part (the final 312Mb).

The way that I import my data is by manually selecting a file and running my code in a Notes Client. But this has to change. In the future I should find a way to download the weekly update (a zip file) and import the data through the backend. I can calculate the name of the file because it is based on the year and month.
By now I have found a webpage that explains how to extract a zip file in Java. All I have to do now is find a way to connect to a https site, provide a username & password and download the zip file. Once I can do this I can unzip it and launch my SAX parser.

Well, my knowledge of Java is very poor. If anyone of you could tell me how I can download a file from an https site that will help a lot!

If I get this thing working I will provide you all a sample database with the ‘dowload the zipfile’, ‘extract it’ and the ‘parse the xml’ – code in it.

Comments»

1. Tim Tripcony - March 9, 2009

Sven,

Julian has some code that has frequently served me well:

http://www.nsftools.com/tips/JavaTips.htm#getinternetfile

It doesn’t specifically address providing login credentials, but this should give you a head start.

2. Retrieving files from HTTPS site using Java « SvenM - March 13, 2009

[...] few days ago I had this problem that I posted on this blog. I needed to find a way to get a file from an HTTPS site. Now retrieving files from FTP or even [...]

3. Unable to extend an ID table - insufficient memory « SvenM - March 23, 2009

[...] Tags: Administration, server trackback Now this is nice. The database that I was talking about the other day is having a serious problem: I can no longer open the damn thing! Due to the heavy import load it [...]