Musings on a working life: 2015

Wednesday, December 2, 2015

Good news (and bad news for this blog)

The good news is, I got a new job that I really like! I'm now working as an information developer for IBM Watson; the technology is really interesting.

The bad news for this blog is, we're encouraged to blog internally, which affects our performance reviews. So, now I'm blogging about my job at my job, and I doubt I'll keep up two separate blogs. That said, I might find time here and again to adapt one of my internal posts to a public audience -- we'll see!

Monday, June 8, 2015

Wisdom from the wellspring: attending a Central Texas DITA Users' meetup

I've been meaning to attend a Central Texas DITA Users' meetup now for ages, having lurked their message boards for years. And I finally went!

"DITA users" group doesn't begin to describe it--more like "creators-of-the-DITA-OT, there-from-the-beginning-and-still-fighting-the-good-fight" users' group might start to cover it.

Baby steps toward a Bayes classifer

Actually, the blog title is a misnomer--I started trying to code up a naive Bayes classifer while studying chapter 6 of the Python NLTK book, but I didn't get very far before I switched over to a tutorial on the scikit-learn random forest classifer, because it aimed at building something instead of demonstrating little building blocks.

I haven't gotten far, but at this point I can create a feature set from a cleaned-up 'bag of words' using scikit-learn.

Next up: training the random forest! (I love the terminology involved, by the way. "training the random forest" sounds like a nonsense poem).

Tuesday, May 12, 2015

Functional programming is hard!

I recently got re-interested* in natural language processing, in conjunction with working on a taxonomy/ontology and reading about applications for taxonomies and ontologies.

I found a course with all materials on GitHub that looked pretty interesting: Applied Natural Language Processing, and thought I'd give the exercises a go in my free time.

But. But. From day one, the coursework emphasizes functional programming. And it's like starting from square one!

More work in coding land

A lot happening in work-land recently; hard to keep up with blogging! Some brief updates on the coding front:

I can call myself a help developer
'Cause I wrote Python code to build our help system that actually made it into the product's driver build, yay!

Specifications as a taxonomy of product features

I've been working hard recently to gather specs for a new hardware product I'm documenting. Combing through competitor docs to compare organization, hierarchy, and wording gave me lots of fodder for my Excel-based taxonomy, increased my conceptual understanding, and springboarded my first draft for a hardware help table of contents.

When I think about it, a specs doc is really a beautiful thing. It tries to encapsulate a body of knowledge about a device and its industry in an organized and concise fashion. In some ways it directly mirrors portions of my taxonomy--just reformatted. And in fact, when Schema.org 2.0 recently released, I browsed around the release, and my ideas about the similarities of taxonomies and specifications were reinforced. For example, check out the automotive ontology proposed extension to Schema.org -- all those properties of vehicles look a lot like (both internal and external) specifications categories to me!

Since I'm currently implementing my taxonomy in DITA as a means of vocabulary control and consistency, I have to consider how to source the specs and share them with other parts of the help. Increasingly I'm coming to the opinion that in in single-sourced documentation, specs should be used as the single source of truth. Well, actually, I have a more complicated opinion than that. Certainly in terms of actual numbers, it's the single source of truth, i.e. in DITA speak, it's the source for all conrefs. But for more 'functional' or 'definition-like' specs who act more like a controlled vocabulary and whose impact ripples through the entire help, perhaps the source of truth should be just that--a controlled vocabulary in the form of a keydefs definitions document or 'library of conrefs.' Time will tell if my strategy works!

Tuesday, March 24, 2015

...Wait, there are alternatives to DITA for small companies?

I'm really excited to follow Tom Johnson's upcoming series on structured authoring in DITA vs. Jekyll at http://idratherbewriting.com/2015/03/23/new-series-jekyll-versus-dita/.

When Tom first started writing about reusing static content with Jekyll ( http://idratherbewriting.com/2015/02/27/static-site-generators-start-to-displace-online-cmss/) he kind of blew my mind. I'd gotten the impression that for writers at small companies/startups, there weren't any really good content reuse technologies out there. But the templates Tom authored looks super slick!

In fact those templates stand in stark contrast to my own experiences tinkering with the DITA-OT. A couple years ago, I spent a week, purely for my own edification, following this tutorial: DITA for solo writers.

It was a heroic effort on the author's part, I must say! And after a bunch of effort, I did in fact manage to hack the DITA-OT with a small topic specialization and content reuse.

And what was my output? Minimally formatted PDFs (I felt lucky I got an output at all; fixing broken PDFs was a big issue), and seriously old-school HTML output:

Never been so proud of my coding skills...

I got a permanently dedicated section of our department's newsletter. someone took the time to make me a logo! I feel so loved.

The actual script they chose to feature in their newsletter this month is ever-so-humble, and was really quick and easy for me to author...but apparently it had an impact, because I'm already seeing reviewers adopt it. Nice when it works out that way!

Tuesday, February 10, 2015

Working on my first taxonomy!

I just finished taking my online taxonomy and controlled vocabularies class through Simmons College, and I'm excited that it's turning out to be directly applicable to my every day work. Fortunately I'm part of a release where the developers seem quite keen on maintaining consistent terminology, so I feel I have internal support to geek out on terms and their relationships.

It's been a great exercise for me--it feels like a creative process to decide how to relate terms to each other and decide on facets. Meanwhile, deciding on hierarchical relationships really forced me to formalize a whoooole bunch of technical learning that was floating around in my brain.

I'm currently maintaining an Excel spreadsheet of terminology decisions, which I'm calling a 'thesaurus' because it includes hierarchical relationships, nonpreferred terms, and related terms. However, I've already gotten little off the beaten track in terms of 'classic' thesauri--for instance, I have to maintain both 'accepted equivalent terms', and nonpreferred terms. That's what happens when your API team and your app software team head in different directions, one going with industry standards and the other with internal standards! No really easy solution there, unfortunately.

At first I was a little bummed to author in Excel when I'd just learned about all sorts of fancy thesaurus software like Data Harmony and Synaptica, but this post clued me into creating collapsible sections in Excel, which eased the pain: http://taxodiary.com/2012/04/maintaining-a-thesaurus-in-an-excel-workbook/
Overall I think it's the best decision for sharing the thesaurus with other content creators and stakeholders. I've already used it as a conversation starter with my co-writer, and we edited it quite a bit as a result.

Here's a little peek:

Finally, when it came to looking at existing thesauri, I ran up short. That worries me a little about the state of the field--why are so many abandoned ontologies, taxonomies, and thesauri floating around on the Web? I mean--I actually used Way Back Machine to find this dictionary: https://web.archive.org/web/20130507093403/http://www.sematech.org/publications/dictionary/u_and_v.htm

On a related note, does Linked Data on the web really have momentum behind it? Or is it that taxonomy and ontology efforts have a future in enterprises but not as much on the web? I'm definitely going to be watching this debate with interest: http://www.iskouk.org/content/great-debate

Monday, February 9, 2015

From zero to editing an android app in 4 hours

Whoohoo! I spent a fun afternoon...learning how to edit an Android app!

It all started when a coworker mentioned how nice it'd be if we had a mobile app to display our internal techcomm conference schedule. I remembered someone who'd left had developed such an app in 2011, and thought, "I know nothing about Android app development. Sounds like a fun excuse to learn something!"

And 4 hours later, I'd figured out how to get the original app developer's source code, edit it, build it, and install the revised app it on my phone. Yay!

Notes on the UX of some taxonomy/controlled vocabulary software

I'm taking Simmons College School of Library and Information Science's online Taxonomy and Controlled Vocabularies course at the moment, and thought I'd post some notes and first impressions I had as a beginner of the software we're trying out in the class.

Musings on a working life