November 11, 2010

In August, I presented about Haddock internals to the Boston Haskell User’s Group, at MIT.  I drew on a white-board for the visuals.  Here’s what I drew (except neatened and digitized, not dry-erase-marker ink).

Haddock / GHC / Code

Here’s the source Inkscape SVG (WordPress.com doesn’t allow using SVG images currently).

Summer of Code Wrap-Up.

August 26, 2009

Status summary: Patches are pending to Haddock and GHC [update: they have been included, and released in GHC 6.12.1 and Haddock 2.6.0].  Processing the guts of Haddock comments gets moved out of GHC to Haddock, although recognizing “– |” etc. are still done by GHC.

In fact, at top-level they’re only recognized stand-alone by GHC (data DocDecl: DocCommentNext, DocCommentNamed, etc.), and Haddock gets to match them up with their definitions (Haddock.Interface.Create.collectDocs).  Inside function types, data constructors and record-fields, though, they still have to be parsed to a more precise attachment by GHC, which occurs in compiler/parser/Parser.y.pp.  But the interiors of the comments can now be parsed by Haddock, at least.  Technically, that is Lexing, Parsing and Renaming RdrName->Name.  Not to be confused with Haddock’s later renaming Name->DocName that tries to figure out where the documentation links for each rendered thing (like Int or Monad or concatMap) should point to.  Renaming RdrName->Name instead starts with something like “concatMap” or “SomethingSauce.Exts.Int16” and figures out what the original defining module is, given the context in which the RdrName appears.  For Haddock, this is determined solely by the modules’s GlobalRdrEnv which contains information like “there was an ‘import GHC.Exts as SomethingSauce.Exts hiding (seq)’” and “This module defines a function or value called ‘foomatic’”.  Full renaming needs to be more sophisticated and resolve the right-hand-side ‘x’ in ‘f x = x’ correctly, by looking more places than the top-level, but Haddock comments don’t have any local scope like that.  At least not currently.

General important concepts in Haddock:

There is a data Interface (Haddock.Types) for each module processed.  This is computed by a sequence in Haddock.Interface, most of which is in Haddock.Interface.Create.  This data Interface is used to render the HTML docs for the module (Haddock.Backends.Html . Which uses an old locally-kept version of an HTML combinator library). (Or instead/additionally to HTML, it can make Hoogle info.)

This would be simple if every module were self-contained, but it isn’t. Haddock needs to find out about other modules, in order to link to them and re-export things from them.  In order to find out about other modules in the group currently being processed (typically a package), Haddock uses a fold over the group’s dependency graph and passes the depended-upon modules’ Interfaces to Haddock.Interface.Create.createInterface.  Note that this has consequences for modules that import each other, although I think it might work acceptably/imperfectly in the presence of .hs-boot files.  Haddock also loads, in sequence, each of this group of modules using GHC (The GHC-API, -package ghc), so GHC can tell us all about them.

Cross-package, the situation is more complicated.  For tedious reasons having to do with space/time efficiency or ease of implementation or nondeterminism or something, we don’t just save all .hs files and Interfaces and stuff to disk.  GHC saves “.hi” files to disk for each module, which tell it about all exported information that’s relevant to a compiler, and a bit more. (For example, it doesn’t include doc comments or remember whether a data was declared GADT-style. Probably. There are some weird things it does remember, like whether a constructor was declared infix.)  These .hi files are how it can possibly compile the modules that you’re haddocking now.  They also let us look up the declarations and types in those .hi files, incidentally (with GHC.lookupName) — though it’s a conversion effort to turn them into HsDecls (see Haddock.Convert, a new module added by my patches).  Haddock, likewise, has to record some information about a module in parallel to the .hi file.  Haddock.Types.InstalledInterface contains this information for each module — it’s a subset of Interface that mostly contains docs since GHC’s .hi doesn’t save any information about them. (And we’re lazy/stingy so we still depend on GHC for type information, despite its imperfections for our purpose.) When a module is being processed, its Interface is created, and then the InstalledInterface subset is saved to disk.  Actually it’s more complicated, because Haddock, unlike GHC, creates a single .haddock interface-file for each *group* of modules it processes (see Haddock.InterfaceFile).  Then when you haddock a dependent module, Haddock loads those .haddock files and looks for info in them rather akin to how it would look for information in a locally imported module’s Interface (though it’s always a different though nearby code-path).  At least hopefully Haddock loads those .haddock files; it has to be told where they are on the command-line.  Cabal will helpfully do so for you, as long as the depended-upon packages have got any installed haddock documentation!

Okay… I think that’s a general overview for now.  Questions?  Was I confusing or clear?

Next Project, Move doc-parsing to from GHC to Haddock

July 21, 2009

I’ve done the first step now! I made a patch that turns the representation of HsDoc in GHC into a FastString rather than a parsed entity, and deleted the parsing code and made it compile.  (HaddockModInfo will need to be FastString-ized also.)

(By the way, this means parsing the interiors of the comments.  GHC will still be the one to recognize “– |”, “– ^” and so forth, for this phase of the project, and to attach them to the parsed declarations.)

Next comes the presumably harder part: add support in Haddock!  (at least the final-product will need to be full of #ifdefs, in order to keep supporting GHC < 6.11, also.)

more cross-package docs details

July 20, 2009

I cleaned up most of the loose edges.  I still need to design and implement the GADT records syntax (should be easy) ; I was thinking:

Here is an example of haddock’s record-display:

http://hackage.haskell.org/packages/archive/mtl/1.1.0.2/doc/html/Control-Monad-Cont.html#v%3ACont

and here’s for an ordinary data type:

http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Maybe.html#t%3AMaybe

Notice that records can have documentation per-constructor-argument.

So we need to decide what syntax we want, since there’s a tension between displaying the type of the constructor itself (including any class contexts, as GHC just implemented syntax for!), and having a separate line on which to document each field.

I’m inclined to copy type information a bit and have it look like, for
data Foo a where
Bar :: Ord a => { f1 :: [a]  — ^ Some values.
} -> Foo a
,
Bar :: Ord a => [a] -> Foo a
f1 :: [a]     Some values.

Or maybe explicit forall “Bar :: forall a. Ord a => [a] -> Foo a” since it obviously needs to (conceptually) scope over the following fields.

-Isaac

Cross-package documentation going well!

July 9, 2009
A good deal of success!  Here is the current status of my loose edges… Plus I’m sure there are some inevitable syntax niceties (choosing when to display in GADT syntax for example), but using this approach, they can’t be perfect, only pretty-good.  (The approach is similar to going through Template Haskell except we found out that TH/Convert had some problems, such as producing RdrNames rather than Names, so I wrote the equivalent function to go directly from TyThing to HsDecl Name).  So:(tyThingToHsSyn :: TyThing -> LHsDecl Name) is done (except probably for a few corner cases) — it compiles and works! (under 6.11.something and 6.10.3; I just failed to install ghc-paths for 6.8.2[*]) Paste of my working module here:
http://hpaste.org/6717

[*]cabal install –with-compiler=ghc-6.8.2 ghc-paths
/tmp/ghc-paths-0.1.0.522545/ghc-paths-0.1.0.5/Setup.hs:7:7:
Could not find module `Distribution.Simple.PackageIndex’:

Important remaining corner cases include:
- It seems that TyCon needs to export a new function (tyConParent :: TyCon -> TyConParent), because there’s no way to get that info presently!
- (isFunTyCon tc || isPrimTyCon tc) : how the heck am I supposed to represent these in a TyClDecl? :-)
- When the HsDecls tree contains something of type HsDoc, do I just leave it empty? Will Haddock-code need to / already does fill that in? I guess I’ll find that out once I test my code on more things…? or what?
- it looks like I neglected to implement class ATs yet.

Also, (parseName “Prelude.->”) crashed GHC with “isDataOcc: check me ->”, so I wasn’t able to test it yet :-) … or is it one of those syntaxes that’s so fixed that you’re not even allowed to export or import it, perhaps?

***

Also I succeeded at integrating it into Haddock.  Well, almost.  For module List, I have the English documentation being displayed where it should be, and I have the signature displayed: all the signatures are “() -> ()”, :-P .  I still need to finish threading the Ghc monad through Haddock.Interface.Create so that I can call lookupName at the right times.
- foible: I needed to convert InstalledInterface’s (HsDoc DocName) to ExportDecl’s (HsDoc Name), so I had to essentially write fmap for HsDoc, because it’s lamely not an instance of class Functor.  Where should that fmap-code be put? (can it be derived yet?)
- In Haddock.Interface.Create:
-* Under what circumstances is (Map.lookup (nameModule t) instIfaceMap) == Nothing?  Is it if we haddock a package for which its dependencies are compiled, but not, themselves, haddocked?
-* I didn’t figure out where to get ExportDecl’s expItemInstances from (but I didn’t really look around for it yet)

okay that’s enough for the day! I need some sleep! And some feedback would be good.  Besides that, I do have a plan: Integrate the two pieces I just did; Test on lots more examples than just haskell98-List; fix the things that I obviously neglected; Make the code a bit more presentable.  Any big pieces I’m leaving out that we have to do before starting to integrate my code?  (well, hoping that there aren’t any more big unexpected things that break along the way, anyway)

-Isaac

cross-package, Plan A

June 25, 2009

I think my next step should be the simpler bit: extend .haddock files to contain their docs. (After doing that, I might have a better guess where best to thread in the type-signatures.)

Which means,

#1. That we need to write the info to .haddock files
#2. that we can parse it.
#3. that InstalledInterface can hold it
#4. find out if there’s anything we need to do to make use of the new info in InstalledInterface… well obviously there is :-)

A test case would be a package, that imports and re-exports anything that has a doc-string.  I could probably download haskell98 from hackage and use it as a test case.

This is out of order I’m sure: at least writing the info probably means we have to change InstalledInterface first, for example.  That’s what plans are for: trying them and refactoring them when they fail!

P.S. I’m doing an awful job at synchronizing my sleep cycle with SimonMar’s… it’s 2 a.m. U.S. East Coast time here, time for me to go to bed (or past time rather :-)

Cross-package documentation, part 1

June 24, 2009

It would appear to the casual observer that Haddock works fairly excellently cross-packages.  For example, I haddocked the Haddock code, and all the links to the types from the GHC API point to the haddock-docs in my GHC 6.10.3 installation.  (Yes, you need the versions of the dependent packages chosen, installed and haddocked, and this doesn’t always work out optimally online. But that’s not the problem that I’m confronting.)

Yes, links work.  But that’s about it, right now.  Compare (broken)  http://www.haskell.org/ghc/dist/stable/docs/libraries/haskell98/List.html to (good) http://www.haskell.org/ghc/dist/stable/docs/libraries/base/Data-List.html .  The broken one’s source (haskell98:List) imports the good one (base:Data.List).  haskell98:List’s documentation luckily lists the *names* of the exported things, because they’re listed in that file’s export list.  But not their types or their docs.  Not the module doc header (though that shouldn’t be copied anyway), and there’s no link to base:Data.List (which is acceptable. Although maybe someone should write a haddock-header to haskell98:List that says it’s the Haskell 98 version (specified at http://www.haskell.org/onlinereport/list.html ) of the slightly expanded base-package Data.List[link].

Anyway, the way we get any links to modules in other packages is because we read their generated .haddock files.  They’re binary and haddock doesn’t have any obvious way to make them human-readable, but they correspond to Haddock.InterfaceFile.InterfaceFile.  Which is, roughly, a list of Haddock.Types.InstalledInterface, which seems to be the interesting bit.

data InstalledInterface = InstalledInterface {
instMod            :: Module,
instInfo           :: HaddockModInfo Name,
instDocMap         :: Map Name (HsDoc DocName),
instExports        :: [Name],
instVisibleExports :: [Name],
instOptions        :: [DocOption],
instSubMap         :: Map Name [Name]
}

A Module (not to be confused with GhcModule, which is defined by Haddock) is a low-information type defined in GHC that contains the package name and version and the module name.

HaddockModInfo is just the module’s header description, plus the portability:, stability:, maintainer: fields (HaddockModInfo is defined in GHC, oddly enough: must be parse result. defined in ghc:HsSyn to be precise.)

DocOptions are hide, prune, ignore-exports, not-home. (defined in haddock code.)

The rest contain a lot of “Name”s, which is a GHC thing that refers unambiguously to the place an identifier originates.  Sufficient for making a link, but not sufficient by itself for copying the named identifier’s docs or type.  So passing over them for now… there is one interesting thing left.

A DocName contains a Name and also (if any) the module we’d like to link to in which that name is documented.  instDocMap :: Map Name (HsDoc DocName).  This gives more info on any number of Name identifiers.  (HsDoc provides formatting, DocName provides its references .)  There is no type information here at all, as far as I can tell, which will clearly need to be remedied somehow (in Interface, roughly a superset of InstalledInterface, types appears in ifaceDeclMap, though I’m not sure if that’s where they’re retrieved from for HTML-doc-printing).  But there’s another big question I need to find out: *which* names are documented in any given module’s instDocMap?  The type provides no clue, nor does its current (lack of a) doc string, nor ifaceRnDocMap.  I could look everywhere in the code that it’s generated, or I could ask David Waern… who will need to tell me if I said anything confused here anyway :-)

How To Navigate Your Code:

June 23, 2009

I’ve always wished I could document my own code with Haddock even when it isn’t exported.  Today I was exploring the Haddock code-base (which contains a fair number of Haddock-comments) and wanted an easier way to read it.

First I tried `cabal haddock –hyperlink-source`.  It only documented Distribution.Haddock because that’s the only exported module of the library!  `cabal haddock -v` told me that it was passing `–hide=all the other-modules` to haddock, which I couldn’t reverse with `cabal –haddock-option`.  The only way to “fix” that was to hack haddock.cabal so that all modules appear under an `exported-modules:` clause (which is only allowed under `library`, not `executable`, so I had to spuriously merge the executable section into the library section).  Yay!  Then I had to remove Haddock.Types’s {-# OPTIONS_HADDOCK hide #-} which was causing a bit of trouble.

Adding –haddock-option=–ignore-all-exports includes non-exported functions in the documentation, which can be helpful or unhelpful.  I wish it had a mode to include the un-exported stuff *after* all the exported stuff, with a clear separation.  As a side note, –ignore-all-exports also “breaks” docs of Distribution.Haddock and any module that re-exports things from other modules, currently…

HsColour is pretty cool too, but it would be even cooler if it hyperlinked all the identifiers in the source, like Haddock does… hmm.

EDIT: I forgot to attach my actual silly patch to Haddock.  Can’t find out how to get WordPress to let me upload it either. oh well, here it is: gee, I can’t even get it to turn into a link, here on Linux?: http://isaac.cedarswampstudios.org/2009/isaacs-silly-haddock-self-doc.dpatch

another boring update, & question :-)

May 26, 2009

got a cold, decided to take a break for the weekend, and is only beginning to start looking at the Haddock code (in ghc/utils/haddock which is from darcs.haskell.org/haddock2.  Apparently it’s managed by “darcs-all” in the ghc tree…).  If there’s any higher-level documentation (online? in the doc directories?) maybe I should find and look at it too.  Spend a few hours there, and once I do that, maybe it’ll be time to pick a small bug/feature ticket to fix..

also see http://trac.haskell.org/haddock/wiki/CrossPackageDocumentation … Does anyone mind picking the second option (reconstruct info from .hi files, at the expense of its formatting details)? — you can comment on this post (is that right? If that doesn’t work, email me at my address listed on http://www.idupree.com/ )

platform progress

May 20, 2009

Jaunty successful!  (though KMail seems a bit unstable)  ghc 6.8.2, darcs 2.2.0

Now Darcs is being annoying, but I did compile a GHC (6.10.3) in 50 minutes, without using parallelism (I didn’t want to worry about remembering how advanced the old build system was)

things are coming along nicely…


Follow

Get every new post delivered to your Inbox.