Notes: Electronic Piers Plowman: Implementing an Edition of a Six-Hundred-Year-Old-Poem for Twenty-First Century Students

March 7, 2008

These are my own notes from the Research Conversation about representing Piers Plowman today, March 7. Presented by Terry Brooks and Miceal Vaughan. (Note (3/11/2008): I went back in and cleaned some of the formatting up on this, since apparently Windows Live Writer is not quite as consistent as I’d like.)

Miceal:
- Work is an electronic edition of the poem aimed towards beginning students
- 14th century poem in 3 distinct versions – 50+ manuscripts!
- First version ~1360s – author is uncertain, could be multiple issues
- Trying to build an archive of all surviving manuscripts of the poem
- Want to produce the poem so that it’s accessible in different formats (wording/spelling formats)
- Review of different versions of first four lines – spelling differing from Middle English to modern with different representations of letters: þ (equivalent to “th”), for instance
- Miceal has been working on this for roughly 15 years.
- Wants to present clean, uncluttered version so that it can be worked with directly
- Had HTML version created for students to work with created by people knowledgeable in HTML
Terry:
- Problem: Content of the poem is being held hostage by a fixed presentation format. Wants to divorce the format from the content.
- Given the original HTML version; first had to go into the source and create a parser that parses out content. Has to also deal with inline commenting. Wants to “unfreeze” the content and put it into a general format.
- XML structure: Section (prologue, passage) made up of many lines. Break the poem down to a word-by-word unit level, which could have separate spellings and spacing requirements and (of course) the value of the word itself. Also has word-by-word footnotes, which have author and line information.
- Problem with representation in XML: HTML escape characters can’t cooperate well with XML parsers. Solved by HTML post-processing that replaces placeholders with the proper characters.
- This was initially done so that it would be fixed, but editing was a need – created an XML editor to allow for these attributes to be changed without knowing the XML itself.
Current XML version focuses on the Knott-Fowler and Vaughan versions of the poem, and encodes line-by-line metadata on each version.
How does one flag the reader that there’s content hidden behind a particular word? Their tactic: change the icon.
How does an expert in Middle English want to read more than one text? Side by side, one above and one below? Actually, it turns out that they want two versions of each line presented together rather than side by side.
Question to audience: How would you build an application so that you can compare two versions (from an information architecture standpoint)?
Question from audience: How do you make this project expandable beyond Piers Plowman?
- Some of these strategies may be reusable.
- Micael: Trying to give access to a variety of versions.
- Terry: Yes, aiming at a generalized information architecture
Question from audience: Why not database-driven instead of XML?
- Terry: Portability.
Question from audience: How do scholars process and comment upon these works? Does literature exist that researches how this is done?
- Micael: No – everyone comments on these works differently.
- Terry: The XML editor spurred the same question – how do people edit these things? Feedback regarding how Micael edits led to editor program changes.
- Micael: There may be a possibility of redefining how commenting is done via the editor.
Question: Pedagogically, is this the same as you would have had students do by looking at two versions side by side?
- Micael: They would have gotten one version. Wants this to be something usable by anyone interested, but also for people to be able to understand the significance of different versions of the same work.
- Terry: I would ask the same question, different angle: How do people read these poems?
- UI design: Micael doesn’t want underlining or coloring of words – wants the data behind the document to be transparent
Micael envisions this project as an archive.
Audience: Interesting part of this is learning from the juxtaposition of different texts. But you’ve got a grid of future/current technologies and activities to consider.
Micael: Johns Hopkins not interested in keeping the electronic version of this information, but others might be interested in coordinating this.
Audience: This would embrace different learning styles that look at the same work from different angles. Example: Translating works, you might work an existing translation to help your translation efforts to make sure you’re being accurate and to help you understand it. Someone else might benefit more from struggling through the translation process without any checks against existing translation work.
Audience: This XML structure really doesn’t seem that portable, since it’s dependent on whether other documents use the same attributes.
Terry: Correct – not sure how generalizable the schema is.
Audience: This would have huge problems going across languages (French, Spanish, etc.)
Micael: This representation scheme would work for words, but not for prose.

Blog RSS

March 4, 2008

Zach Hale points out that my RSS feed only publishes post summaries. Whoops. I’ve changed it (I hope) so that it properly pushes out entire posts.

For all two of you that might follow this blog using something other than the web site

Tweeting

March 2, 2008

I’ll credit Zach Hale for first making me wonder why the hell Twitter was really even worth thinking about (though I can’t appear to locate my original comment on his blog to that effect).

After much resistance, I’ve finally set up my Twitter account (you can find it on my Profiles menu on this site’s navigation bar). Why? This series of articles had a lot to do with it, but I also decided that I’d take a page from the book of one of my co-workers, Martin Criminale, and at least try throwing my hat in the ring. And, of course, Zach had a bit to do with it.

Now if they only had an import option that allowed me to upload contacts without sending out invitations (the Gmail contact import doesn’t appear to be working for me at this point). I’m also curious about whether it might be possible to integrate my blog posts and my Twitter posts in such a way that they all appear in a continuous stream on this page (without necessarily being an entry in my WordPress RSS feed). It’s probably doable, just a question of figuring out how. Tweets, as they call Twitter entries, would have to be indicated, but that’s not overly hard. Perhaps a combination of SimplePie and my standard WordPress template code?

Tautologies

February 28, 2008

A discussion in IMT 530 reminded me of tautologies – essentially, logical assertions based on variables. Using tautologies, you can construct what are called truth tables – tables that show when a particular condition holds. Thus, if I treat two variables – A and B as boolean values (true/false), then ask what happens when we apply the AND operation and OR operation to these two variables separately, you end up with a table that looks like this:

A	B	A AND B	A OR B
T	T	T	T
T	F	F	T
F	T	F	T
F	F	T	F

This skips the formal notation. You can go further – there are inference notations, NOT notations (an inversion), and I believe there may also be NOR and NAND (not or and not and), though these operations may simply be a combination of the AND/NOT or OR/NOT formulations rather than formal expressions.

Résumé Updated

February 24, 2008

My résumé has been updated. I’m starting to wonder whether I need to trim the damned thing, since it does seem like there’s a lot on there, and some of it may stop being entirely relevant after a certain period of time. I’m still very proud of being Eastside Journal’s Most Inspirational Graduate of 2001, but how long does a high school graduation award actually matter? This is a bit of a trickier question, since I’m still in school. I’ve had people look at that document and think it way too long, while others think it proves that I have a vast array of experience (let’s ignore my personal reaction to that last opinion for the moment).

iA SUMMIT 2008 – Miami, FL

February 24, 2008

Hmm, if it weren’t in the middle of the second or third week of Spring, this would be freaking awesome. Registration would be fairly cheap, since I happen to have recently become a student member of The Information Architecture Institute.

Firefox Speed Tweaks

February 24, 2008

I found these tips on speeding up Firefox here, and it does seem to speed it up significantly even on broadband. However, a couple of the flags (I suspect) refer to older Firefox versions than what I’m currently running (2.0.0.12). Here are the ones I set in the “about:config” screen:

network.http.max-connections: 48
network.http.max-connections-per-server: 16
network.http.max-persistent-connections-per-proxy: 8
network.http.max-persistent-connections-per-server: 4
network.http.pipelining: true
network.http.pipelining.maxrequests: 100
network.http.proxy.pipelining: true
Creating a new integer value:
nglayout.initialpaint.delay: 0

I guess my only question at this point is (a) how much these settings increase the load on web servers and (b) whether these are changes that should really be made. It seems like most of the boost same from the last new integer value, if anything at all sped it up, since painting is now nearly instantaneous. All the other flags do is increase the number of connections the browser is allowed to make (and how it’s allowed to make them, if I’m understanding the pipelining setting properly). Is there any documentation on about:config values?

My Personal Information Management: Not Managed, Really

February 4, 2008

Something quite interesting popped into my head, and thus prompted this post. As most know, I do a lot of reading as a part of my masters studies, and have done a lot of reading in the past regarding a host of different topics, particularly during my undergraduate work at Evergreen. Oddly, when I’m doing academic work, I almost never like to read anything else, since my energies tend to get a bit drained from having to keep up with the academic stuff in the first place — there’s residual effect as well in that I seem to not like reading much for time periods after the academic year has ended. Regardless, I find myself in a bit of a quandary; I’ve done a lot of reading on the subjects of sustainability and information management, but I really have no method as it stands of referencing all of that information or even recalling where something in particular cropped up.

This is a big problem, and spans a lot of different resources: textbooks, class notes, handouts, technical articles, magazine articles, programming code snippets, old web site designs, even in-line notations on whatever I’m reading. I come up with ideas for projects that (no pun intended) peter out (cough) after a while, either for lack of motivation or for lack of appropriate reference material – in general, it tends to be more the former than the latter, but lack of reference material also rears its ugly head occasionally. This isn’t because I lack the information; it’s because I’ve seen it somewhere but can’t find it again!

I’m not the only one. Not by a long shot. Everyone faces this. I have a slight advantage in that I’m beginning to recognize some of the ways that this is solvable, but at a slight disadvantage in that I am not quite as involved with stuff like social tagging or folksonomies — though I should note that Wikipedia has it wrong; folksonomies and social tagging are not the same thing, and saying they are is misleading. Anyway, the main reason I have a problem is that I don’t have a quick way of finding any annotations or relevant readings for a particular topic. If I wanted to remember a bit about economics, for instance (a highly relevant subject for me at the moment because of PB AF 594), I don’t have any way of knowing what articles I’ve read related to the subject or where my books are that cover that subject or what I might’ve taken as notes in classes three or four years ago that talked about the subject. This is partly lack of time to look all this crap up. This is also partly because that requires locating things – like my ink in my last blog post, I may not know it’s already around or may think I loaned the book on the subject to someone else. I actually thought I had loaned one of my economics books to my mother (don’t ask me why I thought this) until I spotted it going to bed one night on a bookshelf directly across from the bed!

I’ve tried recently to reduce the amount of stuff I hang on to that makes it harder to find things. I’ve started a “clippings binder”, where I rip out magazine articles that I think might be useful for future reference and recycle the rest of the magazine. I can’t bring myself to do this for my copies of eco-structure, since those are just pure gold, but most of the other magazines I have floating around succumb to this sooner or later. I can’t do this to books (and won’t – my father, who is doubtlessly reading this, would about have a conniption and ban me to the seventh or eighth layer of hell). Last year before moving to Seattle, I donated a bunch of (admittedly mostly fiction) books to Olympia’s Goodwill branch to reduce the number of books I had sitting around. But really, this hasn’t done much – I still have a lot of books I want to be able to reference.

There’s an extra dimension here – not only is there stuff I have read, but there’s stuff that looks relevant that I want to read, but can’t find the time.

It seems like the only really good way of doing this would be to start creating additional notes on every single book I read that might be relevant to future work, but that in and of itself is a lot of additional work. Would it increase my ability to look for and find information? Probably, especially if it were implemented correctly (I’d guess a wiki system with some sort of tagging grafted on would work quite well for this). Perhaps I’ll take a sabbatical in 2009 after I graduate and spend the summer reading and making notes and putting them into some coherent system. Yeah, right. So how do we organize all these resources that we personally find relevant? There are answers — maybe — and those answers are (fairly) likely to be relevant. But in the meantime, if I want to remember all I’ve seen on sustainability, I’ll have to read it all over again, or at least spend a copious amount of time reading over whatever notes I made in the margins of books or on paper somewhere in a binder buried in my closet.

That’s assuming those notes existed at all, and that’s a whole ‘nother problem.

Ink

February 4, 2008

I’ve been thinking I needed printer ink for the last several weeks, since my printer is reporting that several of the cartridges are getting quite low. I had intended to order some tonight, and nearly did until I opened my filing cabinet and found refills for every single ink cartridge I have.

Well, at least I found the cartridges before I ordered new ones…

Note – I use a business-level printer that does duplexing and provides an insane amount of paper storage capacity (and it’s got a wireless connection built in to boot) – why do I use something with that much power? Home-use printers seem to fall a bit short in the areas of networking and duplexing, thus I went to business models. This is an HP OfficeJet Pro ~~K550dwtn~~ (actually, it’s a K550dtwn), and thus far has served me quite well. It helps that I keep my need for ink down by forcing all printouts to only use black ink and to use the “Fast/Economical Printing” setting (which is essentially draft printing). There is no visually appreciable difference between draft printing and normal printing speeds, except that draft printing uses a lot less ink.

Notes: Using Uncensored Communication Channels to Divert Spam Traffic, January 31, 2008

January 31, 2008

This was a presentation given by Benjamin Chiao from the University of Michigan – he’s currently a PhD student at their Information School, but also has an economic background, which is where much of this talk was couched.

What’s the point of solving spam problem? Less time sorting spam, less economic cost for blocking spam, customers spend less money
$10 billion/year spent on spam related technologies
What is uncensored/open channel? keep inbox filters, no filters in special folder, guarantee delivery of messages into folder
Properly tagged messages will automatically be assigned to a folder/label
No new technological infrastructure required and fully reversible
Existing mechanisms to prevent spam: legal punishment, filters
Proposal of the open channel: decrease benefits of spamming by decreasing the number of recipients
Economics: micro-economic model shows open channels increase benefits to recipients, advertisers
This is not a unique mechanism – Chiao compared it to TV shopping channels: you don’t have to watch, but the information is constantly there
Open channel is like web sites – anyone can post
Not excluding the possibility of search within the open channel
Sender tags sent messages (as being part of the channel? This wasn’t clear in the talk)
The definition of spam used here specifically targets unsolicited commercial mass e-mails – no other message types are considered here
Current spam volumes are between 80-90% of total network traffic – 40% advertise medications, 19% is adult content, 41% other (according to Evett 2006
Spammers continue because they are economically supported – there’s a point where the supply of spam must meet demand
Why do we need open channel? Why not just search for the content via existing search engines? Sites selling these products disappear too quickly: 30% of domains created die within a day (according to MessageLabs 2005)
Spammers need to keep pushing information to inboxes because they must move rapidly due to legal reasons
60% of spam messages are sent by zombies – computers hijacked for the explicit purpose of sending spam
The CAN-SPAM Act has essentially legalized spamming
The open channel proposal separates the current e-mail ecosystem into two ecosystems – one “open” (the proposal) and one “traditional” (the current model)
Audience observation: this system assumes that EVERY e-mail system implements the open-channel concept
Current technology already partially implements this idea (sort of)
Spammers might be happier on open channel! 😀
This is still a theoretical idea
Essentially create two channels: one open and one censored (I’m not clear on whether the “channels” are analogous to the “ecosystems” mentioned above)
E-mail recipients opt in to the open channel in order to maximize their own utility
The sender gets its current revenue from the advertising charge times the number of mails received
The sender’s current cost is the constant reestablishment of sending channels (zombies)
The open channel attempts to establish equilibrium between advertisers and receivers of spam (note that advertisers, senders, and receivers are independent parties)
There is not just a supply curve but a demand curve for UCM
The open channel method induces UCM to move out of the current e-mail system

I’m not sure Benjamin gave sufficient background to make any of us fully appreciate the idea – there’s two problems with it that I can see: first, it exists within the reality of economics, not the reality that we commonly deal with. Thus, it’s governed by the same economic laws that give me such a headache in PB AF 594, and understanding the concept requires a suspension of our own realities in order to appreciate the laws that govern the proposal. The second problem is that it’s not clear how this can be implemented within the current system. Is this a system that merely adds a tag to all messages that identify it as open-channel or “traditional”? How do you physically separate the two ecosystems without actually modifying the current e-mail structure, and how do you enforce proper usage of both ecosystems? An honor system in which we assume that the senders, the receivers, and the advertisers are all working to maximize their own utility (basically their net happiness) is perfect in economic theory because economic theory establishes that everyone will strive towards some theoretical maximum benefit, but in reality, it just doesn’t seem possible.

There was one thing that I want to follow up on – Benjamin mentioned the Attention-Bond Mechanism (Loder 2006) in his talk, so I’ll have to look up exactly what that entails (it’s a concept related to the acceptance or rejection of e-mail messages).

Peter Ellis

Tag Archives: Computing

Notes: Electronic Piers Plowman: Implementing an Edition of a Six-Hundred-Year-Old-Poem for Twenty-First Century Students

Blog RSS

Tweeting

Tautologies

Résumé Updated

iA SUMMIT 2008 – Miami, FL

Firefox Speed Tweaks

My Personal Information Management: Not Managed, Really

Ink

Notes: Using Uncensored Communication Channels to Divert Spam Traffic, January 31, 2008