Archive for August, 2011

Commentary and Updates to my Clojure Code

August 13, 2011

So I realize my previous post was less descriptive than it could’ve been.  And it still will be, but here is the story.

The “project” I’m working on is to open a Fiddler session archive — that tool captures HTTP traffic, and can save the results…

The session archive I am interested is Tracer results from PegaSystems PRPC tools (PegaRules Process Commander), which is a series of XML results from the server (requested by the IE window open on my desktop requesting such).

So unless you’re extracting XML from a Fiddler session archive, this code won’t, by itself, be meaningful to you, but a smart programmer could adapt some of this for various purposes.  And, hopefully, the concepts may be interesting.

That said, let’s get back to the code:

I would’ve liked to replace iterEltFn with something like map, but the SingleNodeIterator won’t coerce to a sequence… Someone want to wrap it for me, I’d take that.

The main function parseIndex, changed to just this:

    (save_xml_doc (iter2addXml nil (.elements t)) "myResults" nil)

The function iter2addXml returns the combined xml document.

(defn iter2addXml [d sni]
  "If the SingleNodeIterator [sni] has more nodes, call_readXml on nextNodes,
    then if doc is nil, add_events to a new_xml_result with the contents,
    else add_events to the existing doc
   otherwise (sni has no more nodes), return the doc as-is."
  (if (.hasMoreNodes sni)
    (let [n (.nextNode sni)
	  nxt (call_readXml n)
	  i 0]
      (if (nil? nxt)
	(recur d sni)
	(let [d2 (if (nil? d)
		   (new_xml_result)
		   d)
	      c (:content nxt)
	      n (add_events d2 c)]
	  ; here we will do size checking things
	  (recur n sni)
	  )))
    ;; else (no more nodes) just return the doc itself, we're done
    d))

(I realize the the code highlighter I’d used gets messed up by WordPress)  The logic is: if there are more nodes, get those, do call_readXml on the results, if the result is nil, recur without it; else add_events to either a new document (via new_xml_result) or the existing doc, then recurse; if there are no more nodes, return the existing doc.

The function call_readXml is, in my opinion, a somewhat elegant “one-liner.”

(defn call_readXml [e]
  (readXmlFile (uri_to_filename (.. e (getFirstChild) (getChildren)
				    (extractAllNodesThatMatch (new HasChildFilter
								   (new StringFilter "S")))
				    (elementAt 0)(extractLink)))))

Granted, most of this is basically a string of Java functions (“e.getFirstChild().getChildren().extractAllNodesThatMatch(...).elementAt(0).extractLink“); then pass the result to uri_to_filename and pass that result to readXmlFile.

  1. ;; read an XML file — more to do here
  2. ;; N.B. “fn” as a param means “filename” not “function”
  3. (defn readXmlFile [fn]
  4.   ;; read the filename
  5.   (let [buffRd (new java.io.BufferedReader (new java.io.FileReader fn))
  6.     lns (slurp buffRd)]
  7.     ;; if the file contents (lns = lines [of the file])
  8.     ;; starts with an HTTP <<status>>, continue
  9.     (if (. lns startsWith “HTTP”)
  10.       ;; split the next line (the hard way), find the whitespace of this (the status) line —
  11.       ;; what remains is the “status,” number and text —
  12.       ;; and the linebreak of the next line
  13.       (let [lnbrk (.indexOf lns \n)
  14.         dt (.substring lns (+ lnbrk 1))
  15.         idxSp (.indexOf lns ” “)
  16.         status (. lns substring (+ 1 idxSp) (– lnbrk 1))
  17.         eol (.indexOf dt \n)]
  18.         ; (println “HTTP status:” (. lns substring (+ 1 idxSp) (- lnbrk 1 )))
  19.     (if (not (= status “302 Found”))
  20.       ;; call the hdr function…
  21.       (let [[hdr r2] (hdr (struct-map header
  22.                 :status status
  23.                 :date (.substring dt 6 (– eol 1)))
  24.                   (.substring dt (+ eol 1)))]
  25.         (println “header is” hdr)
  26.         ;; then parse the remaining data from the file
  27.         (clojure.xml/parse (new java.io.ByteArrayInputStream (.getBytes r2))))
  28.       ;; if this was a “302 Found” status, we’re dealing with a VIP exchange,
  29.       ;; there’s nothing useful here…
  30.       (println “Location is:” (. dt substring 10 eol))))
  31.     )))

Similar to what was before, but condensed twice: I could do more, like
“(slurp (buffRd (new java.io.BufferedReader…” and more: using duck-streams — I’m just thinking out loud.

And the hdr function has been re-structured for easier reading:

(defn hdr [sm str]
(let [[ln rem] (nlr str)]
(cond
(.startsWith ln "Cache-Control:") (recur (add_val :control (.substring ln 15) sm) rem)
(.startsWith ln "Keep-Alive:") (recur (add_val :keep-alive (.substring ln 12) sm) rem)
(.startsWith ln "Content-Type:") (recur (add_val :type (.substring ln 14) sm) rem)
(.startsWith ln "Connection:") (recur (add_val :connection (.substring ln 12) sm) rem)
(.startsWith ln "Content-Length:") (recur (add_val :length (.substring ln 16) sm) rem)
(.startsWith ln "Content-Language:") (recur (add_val :language (.substring ln 18) sm) rem)
(= 0 (count (.trim ln))) (list sm rem)
true
(println "no match and length wasn't zero (i.e. unknown, non-blank line):"
(count ln) "'" ln "'"))))

This is a new function:

(defn add_events [d events] ; add these trace events (contents) to the xml doc
(type events) "]")
(if (nil? events) d ; if events was nil, just return the doc unchanged
 ;; otherwise
(let [a (attrs d)
      r (reduce (fn [v i] (conj v i)) (or (:content d) []) events)] (assoc d :content r :attrs (assoc a :count (+ (:count a) (count events)))))))

I realize I’m writing far too much code here. Sorry.
I did wind up re-writing the XML emitting code, but that’s for another day.

Advertisements

My Clojure Code before Updates

August 8, 2011

I’m posting the code for my “project” — the technical details of which are terse.

I’m not going to describe a lot, here, but as I go…

(If you’re not a clojure coder, look away)

  1. ;; Fiddlr_To_Xml
  2. ;; my project to convert a Fiddler2 Session Archive (.SAZ) to XML file(s)
  3. ;; for info on Fiddler2 see http://fiddler2.com/fiddler2/
  4. ;; for this to work (the unzipping of the .SAZ, anyway)
  5. ;; I’m depending on 7-Zip (as you can see below)
  6. (require ‘clojure.contrib.shell)  ; not recommended, but to get things working…
  7. (def zipExe \”C:\\Program Files\\7-Zip\\7z.exe\”)  ; constant strings
  8. (def FiddlrCaptsDir “C:\\Documents and Settings\\gparks1\\My Documents\\Fiddler2\\Captures”)
  9. ;; given a filename (a fiddler session arhive), unzip it
  10. (defn unzip [fn]
  11.     (println “unzipping “ fn “…”)
  12.     (let [res (clojure.contrib.shell/sh zipExe “x” fn “-y” :dir FiddlrCaptsDir)]
  13.                     ;(prn res)
  14.       (println “Exit Code: “ (:exit res))
  15.       res))
  16. ; for whatever reason, I’m not getting a real exit code, but it works, so I don’t care

(via http://quickhighlighter.com)

Now, the following felt like real functional programming (because I’m actually treating a function as a first class object):

;; my original function for printing (all) the elements of a SimpleNodeIterator
; (defn printElt [i]
;  (if (. i hasMoreNodes)
;    (let [n (. i nextNode)]
;      (println n)
;      (printElt i))));; my revised helper function — used for many things, including printing
;; given a SimpleNodeIterator, and a function
;; iterate and call fn for each
;;
;; (I bet there’s an easier way, but this works for now)
;; I was just happy to be passing a function around
;; as an argument to another function…
(defn iterEltFn [i fn]
(if (. i hasMoreNodes)
(let [n (. i nextNode)]
(fn n)
(recur i fn))))

And using that here and here:

;; new and improved printElt — call iterEltFn w/ a function (println)

(defn printElt [i]

(iterEltFn i (fn [x] (println x))))

;; 2nd function using iterEltFn; this time, print the element, and the children

(defn printEltChild [i]

(iterEltFn i (fn [x]

(println x)

(printElt (.. x (getChildren) (elements)))

(println)

)))

The following defines a “constant,” two helper functions and a structure (used below):

; ‘global’ filter on tracer string
(def fStr (new org.htmlparser.filters.StringFilter “/prweb/PRTraceServlet?pzDebugRequest=Trace”)); helper function to take a URI and convert to a FileName string
(defn uri_to_filename [uri]
(if (= 0 (.. uri (substring 0 6) (toUpperCase) (compareTo “FILE:/”)))
(let [r (. uri substring 6)]
(.. r (replace “%20” ” “) (replace “/” \\)))));; helper function to split a string into this line (up to CR), and the remainder…
(defn nlr [s]  ; Next Line and Rest
(let [cr (.indexOf s \n)
l (.substring s 0 cr)
r (.substring s (+ cr 1))] (list l r)))

;; creating a structure for HTTP header
(defstruct header :status :date :keep-alive :length :control :connection :type :language)

This header function is the first of things to re-do, when I move forward:

;; hdr function
;; parse the lines from a Fiddler2 server response file
;;
;; here is an ‘example’
;;    HTTP/1.1 200 OK
;;    Date: Wed, 06 Jul 2011 18:22:56 GMT
;;    Cache-Control: max-age=0
;;    Keep-Alive: timeout=10, max=980
;;    Connection: Keep-Alive
;;    Content-Type: text/xml;charset=UTF-8
;;    Content-Language: en-US
;;    Content-Length: 76960
;;    
;; (note the ‘blank’ line at the end!)
;; the HTTP (first line, w/ status) and Date have been parsed separately, so the str argument here
;; begins at the 3rd line.  Although there is some semblence of “order” to the lines, it is not dependent on such,
;; save that the blank line be last and the last line is blank.
;; [And there’s probably a better way to structure this, as well…]
(defn hdr [sm str]
(let [[ln rem] (nlr str)]
(if (.startsWith ln  “Cache-Control:”)
(let [cc (.substring ln 15)]
(assert (nil? (:control sm)))
(recur (assoc sm :control cc) rem))
(if (.startsWith ln “Keep-Alive:”)
(let [keep (.substring ln 12)]
(assert (nil? (:keep-alive sm)))
(recur (assoc sm :keep-alive keep) rem))
(if (.startsWith ln “Content-Type:”)
(let [typ (.substring ln 14)]
(assert (nil? (:type sm)))
(recur (assoc sm :type typ) rem))
(if (.startsWith ln “Connection:”)
(let [conn (.substring ln 12)]
(assert (nil? (:connection sm)))
(recur (assoc sm :connection conn) rem))
(if (.startsWith ln “Content-Length:”)
(let [len (.substring ln 16)]
(assert (nil? (:length sm)))
(recur (assoc sm :length len) rem))
(if (.startsWith ln “Content-Language:”)
(let [lang (.substring ln 18)]
(assert (nil? (:language sm)))
(recur (assoc sm :language lang) rem))
(if (= 0 (count (.trim ln)))
(list sm rem)
(println “count wasn’t zero:” (count ln)))))))))))

The rest of this entry is presented in reverse order, so the logical flow is evident.

Here is the main function:

;; ****************************************************************************************************
;; * main function to parse an index
;; (no params necessary — the filename is at a well-known location)
;; calls printRowInfo, which does all the heavy lifting
;; (this could actually be re-structured to do the primary filtering here,
;;   and printRowInfo could be simplified…)
;; ****************************************************************************************************
(defn parseIndex []
  (let [parser (new org.htmlparser.Parser (str “file:///” FiddlrCaptsDir \\_index.htm”))
  ; iter (. parser elements) 
  ; fChStr (new org.htmlparser.filters.HasChildFilter fStr)
fCh2Str (new org.htmlparser.filters.HasChildFilter
(new org.htmlparser.filters.HasChildFilter fStr))
t (. parser parse fCh2Str)  ; a nodelist!
]
; (printElt iter); (printEltChild (. t elements))
    (printRowInfo (. t elements))

  )
)

Which calls printRowInfo for the entire list of elements:
(comments “extracted” from quickhighlighting for “clarity”

;; ****************************************************************************************************
;; main (internal) function
;; given a SingleNodeIterator argument, iterate
;; the function passed to iterEltFn gets the text (already known to contain "pzDebugRequest=Trace"),
;; and the server filename (from the children of the first child) --
;; the link from the first element of the list, which is the anchor containing the text "S" of the children of the 1st TD elt
;; [sSvrFile from datas, which is getFirstChild [TD] getChildren extractAllNodesThatMatch
;;     {[anchors] with children which are strings equal to "S"} elementAt 0, extractLink
;; each iteration calls readXmlFile
;; ****************************************************************************************************

Of interest, see that this function relies exclusively on iterEltFn and only creates an (anonymous) function to pass to it — I could make this a defined function, separating that work out

(defn printRowInfo [r]
(iterEltFn r (fn [n]
(let [datas (. (. n getFirstChild) getChildren) ; slow way to call a member of a result
; the children of the first child (a “TableColumn” [TD]) are:
; LinkTag [A] w/ text ‘C’ (Client)
; LinkTag [A] w/ text ‘S’ (Server)
; LinkTag [A] w/ text ‘M’ (M___)
fChStr (new org.htmlparser.filters.HasChildFilter fStr)
; the following is a short-cut
; n.getChildren().extractAllNodesThatMatch(fChStr).elementAt(0).getFirstChild()…
txt (.. n (getChildren) (extractAllNodesThatMatch fChStr) (elementAt 0)
(getFirstChild))
sSvrFile (.. datas (extractAllNodesThatMatch
(new org.htmlparser.filters.HasChildFilter
(new org.htmlparser.filters.StringFilter “S”)))
(elementAt 0)(extractLink))
]
; (println “Node =” n)
; (println “first child (data) w/ three <A> tags:” datas)
; (println “text node is” txt)
; (println “file is ” (uri_to_filename sSvrFile))
(readXmlFile (uri_to_filename sSvrFile))))))

And, finally, a good, stand-alone function to read an XML file:

;; read an XML file — more to do here
;; N.B. “fn” as a param means “filename” not “function”
(defn readXmlFile [fn]
;; read the filename
(let [buffRd (new java.io.BufferedReader (new java.io.FileReader fn))
lns (slurp buffRd)
h (first lns)]
;; if the file contents (lns = lines [of the file])
;; starts with an HTTP <<status>>, continue
(if (. lns startsWith “HTTP”)
;; split the next line (the hard way), find the whitespace of this (the status) line —
;; what remains is the “status,” number and text —
;; and the linebreak of the next line
(let [lnbrk (.indexOf lns \n)
dt (.substring lns (+ lnbrk 1))
idxSp (.indexOf lns ” “)
status (. lns substring (+ 1 idxSp) (– lnbrk 1))
eol (.indexOf dt \n)
]
; (println “HTTP status:” (. lns substring (+ 1 idxSp) (- lnbrk 1 )))
(if (not (= status “302 Found”))
;; call the hdr function…
(let [[hdr r2] (hdr (struct-map header :status status :date (.substring dt 6 eol))
(.substring dt (+ eol 1)))]
(println “header is” hdr)
;; then parse the remaining data from the file
(let [p (clojure.xml/parse (new java.io.ByteArrayInputStream (.getBytes r2)))]
;; ** more to do here **
p)
)
;; if this was a “302 Found” status, we’re dealing with a VIP exchange, and there’s nothing useful here…
(println “Location is:” (. dt substring 10 eol))
))
)))

Commentary, what there is, to follow…