CONTENTS:
  1. INTRO
  2. FORMAL GRAMMAR
  3. OVERVIEW
  4. EXAMPLES
  5. RELATED CONVENTIONS
  6. REGULAR EXPRESSIONS
  7. OUTPUT
  8. REFERENCES

=====================================================================
1. INTRO

This is a proposal, on methods of error reporting. Target of this 
document is developers of COD (Crystallography Open Database) tools.

This kind of error reporting was developed with influence of GNU 
Programming guidelines and view of Turbo Pascal 5.5 parser reporting.
It is a mixture of two schools. Might be better, might be worse - it is 
for you to decide...

One of the principles is that most, if not all, error messages should 
fit into a single line (80 columns, at its best).
So it is easier to grep through them.

For meaning of words SHOULD, MAY, etc. please refer to rfc2119[1].

=====================================================================
2. FORMAL GRAMMAR

Below, we provide the formal grammar of the error messages in EBNF[4,5]:

(* @BEGIN EBNF *)

error_list = {[spaces], newline}, error_report, { error_report };

error_report = progname, ':', [spaces], [location], ':', [spaces],
               (status, ',', [spaces] | lowercase_word), [message],
               [ ':', [spaces], newline, { space, code_line } ],
               {newline};

code_line = [anytext], newline;

location = file_position, [spaces], [ additional_position ] ;

file_position = filename, [spaces], [ file_line_column ] ;

file_line_column = '(', lineno, [ ',', linepos ], ')';

message = text ;

lineno = integer ;

linepos = integer ;

additional_position = text ;

status = uppercase_letter, { uppercase_letter | digit | "-" | "_" } ;

filename = filename_char, { filename_char } ;

progname = filename_char, { filename_char | space } ;

anytext = anychar, {anychar} ;

(* Lexical tokens: *)

filename_char = letter | digit | unicode_symbol | quote | punctuator ;

spaces = space, {space} ;

space = " " | "\t" ;

newline = ? \n ? ;

integer = digit, {digit} ;

text = word, [spaces], { word, [spaces] } ;

lowercase_word = word_char, { lowercase_letter }, [spaces];

word = word_char, { word_char } ;

word_char = letter | digit | unicode_symbol | quote | punctuator | braces ;

anychar = word_char | space | delimiter ;

letter = lowercase_letter | uppercase_letter | "_" ;

unicode_symbol = ? [^\p{ASCII}] ?;

lowercase_letter = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j"
                 | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t"
                 | "u" | "v" | "w" | "x" | "y" | "z" ;

uppercase_letter =
             "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" 
           | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T"
           | "U" | "V" | "W" | "X" | "Y" | "Z" ;

digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;

quote = '"' | "'" ;

braces = "(" | ")" | "[" | "]" | "{" | "}" ;

(* Punctuators for error messages SHOULD NOT include the ":" character: *)
punctuator = "!" | "#" | "$" | "%" | "&" | "*" | "+" | "," | "|" | "~"
           | "-" | "." | "/" | (* ":" | *) ";" | "<" | "=" | ">" 
           | "?" | "@" | "\" | "^" | "`" ;

(* Delimiters are used to separate distinct parts of the error
   message, but they may be used arbitrarily in the code line that
   follows the structured message text: *)

delimiter = ":" ;

(* @END EBNF *)

=====================================================================
3. OVERVIEW

The usual format for error message is as following:
$0: "<fn>"(<ln>,<co>) <apos>: <el>, <emsg>
Where:
    - $0 stands for script name; programs convention meaning "my name" - 
      by what name script has been called.
      In case of Perl libraries - $0 returns the name of program which 
      is calling library and this is correct behavior here.
      If the library is called using -e (here - Perl command line switch 
      indicating next to it written code execution; for similar 
      behavior in other programming languages - take a look at 
      respective manual) instead of '$0: <fn>...' we SHOULD output only 
      '<fn>...' as it appears more readable.
    - <fn> means file name. The name with user provided path to the file 
      being currently processed.
      If information is read from STDIN, then we should write 'file "-"' 
      as dash ('-') stands for STDIN in UNIX CLI scripts. In other cases 
      prefix 'file "<filename>"' is not required as it is more 
      intuitive.
    - <ln> means line number in which error was detected. Optional.
    - <co> stands for column number. Optional.
    - <apos> means Additional POSition information. I.e. for CIF files 
    we could place here DATA block name (in format data_<title> - so, as 
    it appears in file). Optional.
    - <el> - emergency level. Optional. Any single word, or script
      error-level constant. MAY be followed by colon, SHOULD be in 
      uppercase, if it is a word.
    - <emsg> is error message as minimalistic and understandable as it 
    is possible.

=====================================================================
4. EXAMPLES

command:: message - explanation
command: file(10,20): message - explanation
command: file(10,20) data_xyz: message - explanation
command:: ERROR, message - explanation
command: file(10,20): ERROR, message - explanation
command: file(10,20): WARNING, message - explanation
command: file(10,20) data_xyz: ERROR, message - explanation
command: file(10,20) data_xyz: ERROR 1234, message - explanation
command: file(10,20) data_xyz: WARNING, message - explanation
command: file(10,20) data_xyz: NOTE, message - explanation

command:: NOTE, message about file "filename.ext", data_xyz - explanation

The 'explanation' part is a system error message if applicable; in *x
type systems, strerror(errno) gives the suitable message.

e.g.:

command: filename: ERROR, can't open - no such file or directory
command: filename: ERROR, can't open file - no such file or directory
command: filename: ERROR, can't open file for reading - no such file or directory
command: filename: ERROR, can't open for reading - no such file or directory

Examples of error reporting:
Example 1:
./cif_filter: tests/cif_filter_004.opt(1): ERROR, no data block heading (i.e. data_somecif) found

Example 2:
./cif_cod_check: comments_in_text.cif data_test: NOTE, _journal_name_full is undefined

Example 3:
./cif_parse_new: tests/cif_cod_check_006.inp(34,38) data_2009397: ERROR, incorrect CIF syntax:
 _symmetry_space_group_name_H-M  'R 3' ccc
                                       ^
Example 4:
./cif_validate: inputs/cif_core.dic: ERROR, no tags were defined in dictionaries

Example 5:
./perl-scripts/cif_validate.pm: inputs/cif_core.dic: unable to open file 'inputs/cif_core.dic' for reading - No such file or directory.

#===============================================================================
5. RELATED CONVENTIONS

The GNU project uses similar error format.

GNU error message formats:

program: message
program:source-file-name:lineno: message
program:source-file-name:lineno:column: message

We could generalize this format to match our needs as follows:

program: NOTE, message
program:source-file-name:lineno: ERROR, message
program:source-file-name:lineno:column: WARNING, message
program:source-file-name:lineno:column: WARNING, message - reason

=====================================================================
6. REGULAR EXPRESSIONS

This section is intended for those, who think that regular expressions 
are more readable than any generalization about generic forms.

This is an example Perl script, which returns all possible parameters:
@BEGIN PERL
    my $FNAME  = qr/[a-z0-9_,\-\.\/]+/si;
    my $ELEVEL = qr/[A-Z0-9]+/si;
    $_ =~ m/
        ($FNAME)?
        :\s*
        (?:
            ($FNAME)
            (?:\(
                ([0-9]+)
                  (?:
                    ,\s*
                    ([0-9]+)
                  )?
            \))?
            (?:
                \s*data_
                ([^:]+)
            )?
        )
        :
        (?:
            \s+
            ($ELEVEL)
            :?
        )?
        \s*
        (.*)
        /six;
@END PERL

After execution of this regular expression against error message (here 
stored as $_) you would get:
    - $1 as program name;
    - $2 could be error level if any present;
    - $3 as file being processed name;
    - $4 as line number of file being processed;
    - $5 as column number of file being processed in line (look $3);
    - $6 as data block name (name <apos> in section 3.);
    - $7 as error message returned.
$2, $3, $4 and $5 are optional parameters and could be left blank, so 
you should check whereas these values exist before printing them... ;)

=====================================================================
7. OUTPUT

Create your scripts with respect. Respect towards users and computers - 
you never know, who will be your next reader.

So far as our scripts are concerned - we will put all our efforts on 
making error reporting according to these guidelines, so you should be 
safe to set up your tools to check for errors in this format.

=====================================================================
8. REFERENCES

[1] S. Bradner, RFC2119 "Key words for use in RFCs to Indicate
    Requirement Levels", http://tools.ietf.org/html/rfc2119

[2] Wikipedia, The Free Encyclopedia, "Backus–Naur Form",
    http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form

[3] Free Software Foundation, "GNU Bison - The Yacc-compatible Parser
    Generator", https://www.gnu.org/software/bison/manual/

[4] Wikipedia, The Free Encyclopedia, "Extended Backus–Naur Form",
    https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form

[5] ISO/IEC 14977 : 1996(E), "Information technology — Syntactic
    metalanguage — Extended BNF",
    http://www.cl.cam.ac.uk/~mgk25/iso-14977.pdf
