This post is incredibly technical – it is software engineer at a deep level. I’m not sorry.
I have a working Haskell program to solve Sudoku puzzles; right now, the board name is hard-coded, but I’m at least reading from a file, and I am quite sure there are boards I cannot solve – given my method is incomplete.
But next I am going to work on a program to run tests fr. outside Haskell, and other improvements.
The code is 17 lines in the main file (plus forty lines of “test” code); and SBoard.hs is 261 lines total. Finally, 24 lines of file reading: one main function, one sub-function and a one-liner for typed Integer square root of an Int
*BoardIO> :t sqrtInt
sqrtInt :: (Integral b, Integral a) => a -> b
(I’m no longer sure why I did that, but here’s the actual code:
sqrtInt i = floor $ sqrtFloat $ 1.0 * (fromIntegral i)
Now then, the primary function is:
main =
do
b <- doRead "n.sdk"
-- print b
-- return (solve b)
solv_dbg b
As you can see, I orignally called solve, which takes a Board as input and returns a Board, but since main has an IO monad, I needed the return there — which has nothing to do with “return” as most programmers know it and love it.
Sans putStrLn and print expressions, here is my main function for solving:
solve b =
let b1 = calcUniqs b
b2 = calcOpen b1
in
if b2 == b
then b
else solve b2
calcUniqs
To examine the function to calculate “unique” cells (by that I mean cells where only one value is posssible – being forced by being unique), first I need to example this utility function
{-
calculate a cell - if it's already known, done;
if there's only one possibility, done
otherwise, it's zero for now
-}
calc_cell c =
let d = fst c
u = snd c
in
if d > 0
then d
else if length u == 1
then head u
else 0
This is used in the “final” expression, below. Given tuples (a, [a]) wehere a is an Ord and also a Num (due to its usage in the function); the first is the current value of the cell (zero = blank), and the second part is the list of “possible” values for this cell — it’s possible, given the calculation of this list, that there may be multiple values even though the cell is already solved, but this function doesn’t care. The logic is,
If this “cell” is already calculated, then we’re done, return that value; else, if the list (of possible values) length is 1, then that is the new value; otherwise, the value is (still) unknown/blank, so return zero.
I should explain, this “fell out” naturally from the way I decided to encode the Board itself, and also the resulting possible values and so on.
type Symbols = [Char]
data Board = B Symbols Int [String]
A Board is, initially, a list of symbols (identical to a string, by the way), an integer which is the square root of symbols (for a 9×9 board, this is 3), then a list of Strings (technically, I ought to make that a list of symbols, I suppose…)
One very important helper function converts the list of strings — each of which is one row — into a list of lists of integers;
“toSolve”
The following is the “heart” of most of the calcs that follow (and precede)
{-
convert a board into an array of "cells,"
where each cell is (d, [Ints]) - if d is zero, the cell is open
the list of integers represents the possible* values there
(* note that is not true if d is non-zero, but in that case, we don't care)
-}
toSolve b =
let
rs = arrFromBoard b
cs = transpose rs
ps = patches b
unf_r = map avail rs
unf_c = map avail cs
unf_p = map avail ps
sq = square b
unf = map (\x -> map (intersect x) unf_c) unf_r
-- the following creates an "array" (list of lists)
-- which is the same dimensions as arrFromBoard
idx_p = map (\r -> map (\c -> sq * (sect r sq) + (sect c sq))
[0..(sq*sq-1)]) [0..(sq*sq-1)]
cmplx = map (\r -> zip (fst r) (snd r)) $ zip unf idx_p
unf2 = map (\r -> map (\i -> intersect (fst i) (unf_p !! snd i) ) r) cmplx
in
map (\r -> zip (fst r) (snd r)) $ zip rs unf2
From the inside out, the underlined map takes a list (e.g. “[0..8]“) of column indicies and returns a list of indicies of the patch for this “cell” (r, [0..8]). That map is within a func. mapped over a list of row indicies.
At the end of this, idx_p is a list of lists which represents the patch index of the corresponding cell in the board array.
So, the let section of this calculates the rows (rs), then the columns (cs), then the patches (ps) — my term for the sub-squares, 3×3 on a 9×9 board; the remaining values per row (unf_r), column (unf_c) and patch (unf_c). Next, unf represents the “intersection” of remaining values per row and column.
Now “cmpx” is an array of (p, [Int]) tuples – same dimensions as the Sudoku board — twice is use the “pattern”
map (\r (zip (fst r) (snd r)) $ zip <something> <else>
Where <something> and <else> represent matching arrays; in one case, “elements” of <something>” are lists, in the other, elements of <else> are lists. The result is an array of tuples (s_i, e_i) where each “s” and each “e” are an element of <something> and <else> respectively.
In both cases, <something> and <else> are lists of lists of elements – either [[a]], or [[[a]]]; in the easier case, a list of rows, where each row is a list of columns, where each column is a single value (an Int, say) which is the value of a cell; in the more complex case, a list of rows, of columns, where each column ‘value’ is itself a list of some data representing a list of possible values, for instance, for this cell.
… and that is perhaps the most complex description I’ll write. (I only hope that, should I come back to read this later, it will still make sense.)
Finally
calcOpen b =
let syms = symbols b
in
-- the following assumes that the blank space (" ") is not a symbol
B syms (square b) $ arrToSyms (map (\r -> map calc_cell r) (toSolve b)) syms ' '
So this calls toSolve on b, the Board, then “map (\r -> map…” because it’s a list of lists calls calc_cell on each element — and that’s why calc_cell takes the input it does.
The function toSolve takes a Board as input, as we’ve seen, and returns an array [a list of lists] of cells, where each cell is a tuple, which can be input to calc_cell far above.
After the map-map, the result is an array of new values, zero thru nine for a 9×9 board, and arrToSyms <array> syms ' ' converts that back to the [String] data to store in the Board. “B...” is the board constructor.
show
The following code, in three functions, allows me to (pretty) print a Board
rowToStr r s =
if length r > s
then show (take s r) ++ " " ++ (rowToStr (drop s r) s)
else show r
extraRow a s =
if length a > s
then (take s a) ++ [[]] ++ (extraRow (drop s a) s)
else a
instance Show Board where
show b =
let s = square b
rows = map (\r -> rowToStr r s) $ arrFromBoard b
in unlines $ extraRow rows s
The magic, which I ought to remember, is in “instance Show Board where\n show…”
The function takes a Board [type] and returns a string.
What I’ve learned
There’s a lot to write here, but I’ve started to get the “hang” of monds — which I’ll have to write later, but others have done better.
Most important, for now, is this: the book Real World Haskell recommends that “beginners” should write the type signature first, to validate the code they get is what they intended. I, however, took only half that advise — I thought carefully about what I wanted, but coded the function then checked the result, without including the “declaration.” (I think I’d get the book, if I can find an updated version; there is a lot more in there than I know, yet.)