Programmer’s Notepad has long needed an improved Regular Expressions engine. Currently PN uses PCRE for all tasks but searching Scintilla. This is because PCRE doesn’t support searching anything but a memory buffer – i.e. it doesn’t support iterators. We need iterator (or indirect access) support because a regex engine for a text editor can’t expect all text for the editor to be in a single contiguous memory block.
Boost::Regex has been suggested several times, but it still doesn’t support named captures. When allowing users to specify regular expressions for use in parsing, named captures can significantly simply the process. For example, when using a regular expression to parse compiler output we have two alternatives:
This uses the standard named capture syntax to name the three capture blocks: “f” for filename, “l” for line and “c” for column. The single expression can be parsed and understood by PN without the user having to understand capture indexing.
This uses basic regular expression capture groups and results in the user having to enter three additional pieces of non-obvious data: the capture index for each capture. In this case these would be 1, 2, and 4 but this would potentially change for each output pattern.
I believe that using named captures significantly improves the user experience around this, especially considering that PN uses %f, %l and %c to represent the three named capture groups meaning that users don’t even need to understand regular expression capture syntax to use them.
Boost 1.35 introduces version 2 of Boost.Xpressive, the other boost regular expressions engine. Boost.Xpressive naturally supports iterators. Version 2 supports named captures.
Implementing a Scintilla Iterator
Xpressive requires a bi-directional iterator class (one that can move forwards and backwards over the contents). I’ve currently implemented a very simple, naive iterator to prove that this can work:
* std::iterator compatible iterator for Scintilla contents
class ScintillaIterator :
public std::iterator<std::bidirectional_iterator_tag, char>
ScintillaIterator(CScintilla* scintilla, int pos) :
ScintillaIterator(const ScintillaIterator& copy) :
bool operator == (const ScintillaIterator& other) const
return (ended() == other.ended())
&& (m_scintilla == other.m_scintilla)
&& (m_pos == other.m_pos);
bool operator != (const ScintillaIterator& other) const
return !(*this == other);
char operator * () const
ScintillaIterator& operator ++ ()
ScintillaIterator& operator -- ()
int pos() const
char charAt(int position) const
bool ended() const
return m_pos == m_end;
This can then be used with Xpressive like this:
typedef boost::xpressive::basic_regex<ScintillaIterator> sciregex;
typedef boost::xpressive::match_results<ScintillaIterator> scimatch;
typedef boost::xpressive::sub_match<ScintillaIterator> scisub_match;
sciregex regex = sciregex::compile("[0-9]+");
if (regex_match(m_scintilla, match, regex))
This code is now available in Programmer’s Notepad subversion, and it seems to work. The iterator needs a bit of improvement to buffer data from Scintilla, or perhaps needs moving so that it doesn’t have to send a windows message for every character access. However, as a proof of concept it’s a good one and it suggests that we should be able to replace the current lacklustre regex searching support with fully featured multi-line support for the next release.