Friday, July 27, 2007


The name Sudoku means "the digits must occur only once". The name is a trademark of puzzle publisher NIkoli Co. Ltd. in Japan.


Other Japanese publishers refer to the puzzle as Number Place, the original U.S. title, or as "Nanpure" for short.Some non-Japanese publishers spell the title as "Su Doku". The numerals in Sudoku puzzles are used for convenience; arithmetic relationships between numerals are irrelevant. Any set of distinct symbols will do; letters, shapes, or colours may be used without altering the rules.

Dell Magazine , the puzzle's originator, has been using numerals for Number Place in its magazines since they first published it in 1979.

The attraction of the puzzle is that the rules are simple, yet the line of reasoning required to solve the puzzle may be complex. The level of difficulty can be selected to suit the audience. The puzzles are often available free from published sources and may be custom-made using software.


Strategies


The strategy for solving a puzzle may be regarded as comprising a combination of three processes: scanning, marking up, and analyzing. The approach to analysis may vary according to the concepts and the representations on which it is based.

The top right region must contain a 5. By hatching across and up from 5s elsewhere, the solver can eliminate all the empty cells in the region which cannot contain a 5. This leaves only one possibility (shaded green).
The top right region must contain a 5. By hatching across and up from 5s elsewhere, the solver can eliminate all the empty cells in the region which cannot contain a 5. This leaves only one possibility (shaded green).

Scanning

Scanning is performed at the outset and throughout the solution. Scans need to be performed only once in between analyses. Scanning consists of two techniques as follows:

  • Cross-hatching: the scanning of rows to identify which line in a region may contain a certain numeral by a process of elimination. The process is repeated with the columns. For fastest results, the numerals are scanned in order of their frequency, in sequential order. It is important to perform this process systematically, checking all of the digits 1–9.
  • Counting 1–9 in regions, rows, and columns to identify missing numerals. Counting based upon the last numeral discovered may speed up the search. It also can be the case, particularly in tougher puzzles, that the best way to ascertain the value of a cell is to count in reverse—that is, by scanning the cell's region, row, and column for values it cannot be, in order to see what remains.

Advanced solvers look for "contingencies" while scanning, narrowing a numeral's location within a row, column, or region to two or three cells. When those cells lie within the same row and region, they can be used for elimination during cross-hatching and counting. Puzzles solved by scanning alone without requiring the detection of contingencies are classified as "easy"; more difficult puzzles cannot be solved by basic scanning alone.



Marking up

Scanning stops when no further numerals can be discovered, making it necessary to engage in logical analysis. One method to guide the analysis is to mark candidate numerals in the blank cells.

Subscript notation

In subscript notation, the candidate numerals are written in subscript in the cells. Because puzzles printed in a newspaper are too small to accommodate more than a few subscript digits of normal handwriting, solvers may create a larger copy of the puzzle. Using two colors, or mixing pencil and pen marks can be helpful.

Dot notation

The dot notation uses a pattern of dots in each square, where the dot position indicates a number from 1 to 9. The dot notation can be used on the original puzzle. Dexterity is required in placing the dots, since misplaced dots or inadvertent marks inevitably lead to confusion and may not be easily erased.

An alternative technique is to mark the numerals that a cell cannot be. The cell starts empty and as more constraints become known, it slowly fills until only one mark is missing. Assuming no mistakes are made and the marks can be overwritten with the value of a cell, there is no longer a need for any erasures.


Generalised symmetries and extended Sudoku board

In "The Hidden Logic of Sudoku", a book based on a systematic logical formalisation of the game, all its generalised symmetries have been elicited, in particular between rows and numbers and between columns and numbers. A new resolution method has been developed, based on their systematic exploitation. As a result, an extended Sudoku board has been designed, which makes conjugacy links appear as bivalue cells and may ease the application of the method. Thus, Hidden subsets and fishy patterns (X-Wings, Swordfish and Jellyfish) all appear as Naked Pairs, Triplets or Quads. Within a general framework for dealing with chains, these symmetries are used to introduce new resolution rules, such as hidden xy-chains. This method has been implemented in the SudoRules solver, based on Artificial Intelligence techniques and simulating a human player.

Computer solutions

There are three general approaches taken in the creation of serious Sudoku-solving programs: human solving methods, rapid-style methods, and pure brute-force algorithms. Human-style solvers will typically operate by maintaining a mark-up matrix, and search for contingencies, matched cells, and other elements that a human solver can utilize in order to determine and exclude cell values.

Many rapid-style solvers employ backtracking searches, with various pruning techniques also being used in order to help reduce the size of the search tree. The term rapid-style may be misleading: Most human-style solvers run considerably faster than a rapid-style solver, although the latter takes less time to write and is more easily adapted to larger grids. A purely brute-force algorithm is very simple and finds a solution to a puzzle essentially by "counting" upward until a string of 81 digits is constructed which satisfies the row, column, and box constraints of the puzzle.

Rapid solvers are preferred for trial-and-error puzzle-creation algorithms, which allow for testing large numbers of partial problems for validity in a short time; human-style solvers can be employed by hand-crafting puzzlesmiths for their ability to rate the challenge of a created puzzle and show the actual solving process their target audience can be expected to follow.

Although typical Sudoku puzzles (with 9×9 grid and 3×3 regions) can be solved quickly by computer, the generalization to larger grids is known to be NP-comple. Various optimisation methods have been proposed for large grids.

Details of computer solutions may be found on the page on the Algorithmics of Sudoku.



Difficulty ratings


The difficulty of a puzzle is based on the relevance and the positioning of the given numbers rather than their quantity. Surprisingly, most of the time the number of givens does not reflect a puzzle's difficulty. Computer solvers can estimate the difficulty for a human to find the solution, based on the complexity of the solving techniques required. Some online versions offer several difficulty levels.

Most publications sort their Sudoku puzzles into four or five rating levels, although the actual cut-off points and the names of the levels themselves can vary widely. Typically, however, the titles are synonyms of "easy", "intermediate", "hard", and "challenging" (also known as "diabolical" or "evil"). An easy puzzle can be solved using only scanning; an intermediate puzzle may take markup to solve; a hard or challenging puzzle will usually take analysis.

Another approach is to rely on the experience of a group of human test solvers. Puzzles can be published with a median solving time rather than an algorithmically defined difficulty level.

Difficulty is a very complex topic, subject to much debate on the Sudoku forums, because it may depend on the concepts and visual representations one is ready to use.








No comments: