COCOA FAQ
By Dr. Andrew Burt,
Chair, The COCOA Association
(All opinions expressed herein are those of the author)
Table of Contents [version 1.53, 10/18/06]:
Definitions
[Top] - What's a "CDS"?
-
CDS stands for Content Display Sites -- such as Google Print,
Amazon.com's Search Inside the Book, Microsoft/Yahoo! and the
Open Content Alliance. It gets tedious writing "Google and Amazon and..."
so I'll call them "CDS" in here.
[Top] - Who's a "copyright owner"?
-
An author or a publisher. Authors are the original owner of the copyright
for works they create (except certain cases, such as when they write
a book for hire for someone else). Authors typically license publishers
the right to print their work via contracts. Electronic rights (to make/sell
electronic versions) are a separate right. (Other rights would be
rights to make a film of a book, translate the book to another language,
create sequels, etc.)
Some works have no copyright, such as works in the public domain.
Copyright eventually expires, meaning anyone can then do what they want with
the work. Works created prior to 1923, for example, are highly likely to
be public domain. Shakespeare, Dickens, etc. are in the public domain.
Some rights are also granted to others by law. See Fair Use below.
Note that a CDS has to either acquire the rights to a work, or have some
other legal right to it, such as if the work is in the public domain,
or the use is covered under fair use provisions.
[Top] - What's "Fair Use"?
-
Fair use refers to rights that copyright law grants that a copyright owner
has no control over. The
"Fair use" provision of the copyright law is here
(but note that a lot of what is consider fair use is determined by case
law, i.e. lawsuits that someone won or lost).
About COCOA
[Top] - What is COCOA?
-
COCOA stands for "Copyright Owners' Control of Access." The "COCOA standard"
or "COCOA Protocol" is a technical description how copyright owners'
would describe what visibility they would like for scanned paged images
of their work. (It also works for other media -- music, film, etc.)
COCOA is administered by the
COCOA Association,
a non-profit established
to implement and operate the COCOA protocol (such as authenticating
members), disseminate information about COCOA, and promote its use.
[Top] - Who created COCOA?
-
Here is a list of the original committee members.
Note particularly that the design team are spread across the far ends of
the copyright spectrum, from
"copyright conservative" to "copyright liberal."
This gives COCOA broad appeal to people no matter how they feel about
copyrights.
[Top] - How does COCOA work?
-
COCOA is simple: a one-stop shop where copyright owners, from individual
authors to huge publishing conglomerates, can specify the visibility of
scanned pages and other content. Thus, if one needs to block the
last three pages of a certain short story in an anthology for contractual
reasons, one quick visit to a secure web page and blink, all CDS's
will be informed and can block pages 214-216 of that ISBN number.
Likewise, it lets publishers establish default visibility standards -
for example, "unless otherwise specified, block the last 25% of novel
pages from view" or "block every 4th page of our textbooks."
It's easy for publishers and authors to use, and easy for CDS's
to implement.
COCOA was designed with other media in mind, so that music and video
files can be described in terms of what amount could be played for free
(say, the first 30 seconds of all songs from a certain publisher or 100%
of a specific video).
COCOA is flexible, easy to use, comprehensive, and adheres to the
copyright law precept that it's the copyright owners who specify what's
visible. With COCOA, publishers and authors can place their work into
CDS's with full confidence that only what makes
sense to show of their work is what will be shown.
Details of how COCOA works are
here.
[Top] - What are the problems COCOA solves?
-
There are four primary problems with displaying scanned page images that
aren't obvious:
(1) Many copyright owners are concerned that even if security limited one
person to looking at five pages
per book, there are millions of people using "peer-to-peer"
(P2P) file sharing programs, so if
a handful of people look at five pages each, it only takes 40 people to
download a 200 page book and share the assembled results,
undetectable by Amazon. This
can be done undetectably with as few as 10 people, and copyright owners are
concerned this process may become automated.
Amazon actually limits one person
to viewing approximately 15% of the pages of one book, not five pages; Google
blocks out approximately a random 15% of the pages of copyrighted books
from view by anyone; nobody can see more than 85%. Some copyright
owners are concerned
that people have the ability
to view 85% of a book from Google and the remaining 15% from Amazon.
They worry this may become easier as more sites display pages, such as
Yahoo! and Microsoft and the Open Content Alliance.
(2) Often a few pages have value. Thousands of short stories in
anthologies can be read in full on Amazon and Google. Some copyright
owners are concerned
that, for example, teachers could
assign readings this way to avoid having students purchase text books, or that
feference books, cook books, etc. could be used like a free online library,
without spend time and money going to the
library or bookstore (which may cost $8 or more in
time and expenses).
Entire books have had to be removed because one author of a work inside
had already sold electronic rights elsewhere.
Even with 15% per-book page limits, some copyright owners are concerned that
many books could lose value when too many pages can be read online.
An author or publisher may need to block only a small number of pages, such
such as to block a poem or photograph or song lyric that they do not have
rights to put online.
Today, to block just one page from view, Google and Amazon require
taking the entire book out of their program.
(3) Copyright owners have the right to control digital publication of their
work. Rightly or wrongly, the law is written so that online publication
of copyrighted works (beyond fair use) must be authorized by the owner of
the rights. Some copyright owners prefer not to have their work online,
as is their right. (For whatever reason -- be it that they are concerned
about piracy or any other reasons; the law gives them this right.)
Any desired changes to this law should be made via Congress.
(4) The "All-e-book" future. It is conceivable that in the not-too-distant
future a large percentage of book content will be read electronically.
If not via an ebook reader of today, perhaps via some sort of digital paper
that allows creation of reading devices
that look and feel exactly like
ordinary paper but display computerized content.
Philips (who invented
the CD) and E-Ink are already mass-producing a promising technology.
While the economic cost from book piracy today is largely in the form
of counterfeit copies of books produced in third world countries, and there
is little evidence that online piracy of books causes economic harm,
authors and publishers are concerned that
an all-e-book future could significantly change that.
These problems -- and several others -- are
explained in greater detail here.
[Top] - How is it different for Google to index a book vs. a web page?
-
Two ways come to mind:
As far as just creating a searchable index of a book vs. a
web page, for a web page, the author put it up on a web site for everyone
in the world to come visit. Authors of books for sale generally write
them with the hope of getting paid when people buy the book -- they didn't
post them on a web site for all the public to read for free. (See also
Isn't browsing a book on a CDS the same as free
bookstore browsing or library borrowing?.) As stated
elsewhere in this FAQ, it's currently being
litigated whether it's legal to create a searchable index.
The second issue, completely aside from the first, is that the CDS's have
gone beyond a mere searchable index. They display the actual
scanned images of the pages, from which it's very simple to extract
an entire book. Even if Google wins the lawsuits about creating an
index, it will still be illegal for them to allow entire books to be
extracted (and that would be a lawsuit I doubt they could win). COCOA
solves this problem.
[Top] - What do you mean when you say current CDS's fail because they're "one-size-fits-few"?
-
Basically, they're trying for a "one-size-fits-all" solution, and failing.
The CDS's have each offered up system where they say, "If you don't like
how our system works, tough, don't put up your books." They offer no
flexibility for a world where each book, and certainly each kind
of book (novels vs. textbooks vs. poetry vs. cookbooks etc.), have
different needs. Nor are they respectful of different views
of authors and publishers about how visible page images should be (whether
that be more or less than their rigid systems offer). They all say,
"our way or the highway." And besides just being arrogant, that doesn't work
for a lot of books.
Today, if you wanted all the pages of your book indexed, searchable, and
page-displayable, except for one page, which has, say, a photograph
or poem for which you have no right to put it up on the net for all to see,
you have to remove the entire book from the CDS. And not just the
page images, but removal from a CDS means it is no longer indexed,
no longer searchable, no text snippets, the whole works.
With COCOA, the whole book could be indexed, entirely searchable, including
text snippets, and all the page images could be displayed, except for
the one page with the problematic photograph or poem.
To block one short story in an anthology, or even part of a story, requires
the entire book be removed from all CDS, indexing/searching and all.
Textbooks, cookbooks, reference books are a similar situation: It may not
make sense to the copyright owner to allow free reading of 15% of such a
work, whereas it might make sense to the copyright owner to block every
4th page (75% of pages available to any one reader instead of 15%). Current
CDS operators don't allow this. Their attempt to use
one-size-fits-all "security" systems are really "one-size-fits-a-few."
Adopting COCOA would greatly expand the number of titles in CDS systems,
and the number of pages visible in each title.
[Top] - How will COCOA greatly expand the number of titles and pages in CDS systems?
-
More titles:
Currently many copyright owners are hesitant to have their books in CDS's
because of several concerns:
(1) that CDS's show too much of their work
(because of their one-size-fits-few rigidity) thus reducing the value
(such as using a reference book online and then not needing to pay for
it);
(2) that their works might be stolen (whether it causes economic
harm or not, the fear is real, and keeps books out of CDS's);
(3) that they have no rights to display certain pages for contractual
reasons (such as a certain poem, photograph, etc.); or
(4) concerns
that the CDS's are illegally using their work (authors are notorious
for getting angry at illegal rights grabs; understandably, since rights
are what ensure authors can pay for food).
COCOA addresses all of these concerns:
(1) Only those pages that the
copyright owner is comfortable with would be shown in page image form.
If it makes sense to block every 4th page of a text book to deter online
use without paying, they can do that. Whereas today, that book would
not appear at all. Consider O'Reilly books, the technical books
with the funky animals on the cover. Most O'Reilly books are not
present in Amazon's Search Inside, even though Tim O'Reilly is a visionary
and generally liberal with electronic rights. It seems likely that a lot
of his titles (if not all) would become available if they could exercise
page-level control over what's shown.
(2) Some people believe COCOA inhibits piracy from
CDS's since all CDS's would have the same pages -- it wouldn't be possible
to assemble a book from any or all CDS's (if the copyright owner didn't
permit it; they could, of course, choose to make 100% of their pages
available if they wanted).
(3) COCOA allows specifying "all pages of this
book except for page 42", if page 42 had a photograph or poem which the
copyright owner lacked permission to display on a CDS. Today, if just
one photograph or poem can't be shown, the whole book has to
be pulled from a CDS. COCOA solves that silliness.
(4) Lastly, COCOA
is the legal, copyright-oriented means by which a copyright owner would
willingly convey the rights for their work to be displayed on a CDS --
it makes the display 100% legal, with their permission. Authors who
have control over permissions are enormously more likely to permit use
in a CDS than authors who feel their rights are being stolen from them.
Thus, with all of these concerns addressed, copyright owners will
feel comfortable putting a great deal more of their titles up on CDS's.
More pages:
Beyond the obvious fact that more titles means more pages (all those pages
from the books that are currently not in a CDS at all), there is the
increase because customers would be freed from the existing (largely
ineffective) page viewing limits that each CDS imposes.
That is, Amazon, for example, puts the following roadblocks in your way
to viewing pages:
So, if you follow the rules (which are, alas, easily broken, as noted
elsewhere), you can't see more than five pages in a row, and not more than,
say, 15% of the pages in any given book. Hit that 15% limit too often,
and you're locked out for good. That's a very severe limit on what pages
you can see.
Via COCOA, copyright owners can specify what makes sense to them, for their
books. They could authorize seeing 100% of pages without limitation, as
many authors would like to do (but aren't allowed to!). They could offer
99.9% of pages of a book, blocking just a few that are problematic.
More to the point, the credit card, login, 5-page forward/back, and "15%
max" limits can be eliminated. You could see every page in a book that a
copyright owner has authorized. If they've authorized 100%, you can read
the whole thing online (and probably share it with others).
Why? Because the copyright owner authorized it.
A survey conducted among authors showed that the vast majority of authors
would permit more pages per book than CDS's currently allow.
Morever, consider if Google wins the lawsuits against it: Google will
still be limiting the number of pages you can see, five consecutive, etc.
With COCOA, those restrictions are unnecessary, and you'll be able to
see more pages. If Google loses the lawsuits, copyright owners will
have to authorize indexing, and COCOA makes that easier than each
copyright owner having to make a separate arrangement with each CDS.
That only results in more titles and more pages than now.
No matter how you slice it,
the net result is that you will be able to see not just pages in
books you can't see any of now, you'll be able to see far more pages in
existing books. You'll be free to see every single page the copyright
owner has authorized.
At present, you are severely limited in how many pages you can see.
COCOA increases the number of pages you can see dramatically.
Bottom line: In all scenarios, COCOA considerably increases the
number of titles and pages you can see.
[Top] - How does COCOA increase the amount of indexing and searching of copyrighted materials?
-
First, as noted elsewhere, COCOA does not prevent indexing or searching or the display of text snippet search results.
But it will increase it. Here's how:
Currently, indexing/searching is inextricably tied to displaying page images.
Thus, if a copyright owner keeps their work out of a CDS (for example,
because they feel it shows too many pages, not because of the indexing),
that book is no longer indexed or searchable at all.
With COCOA, a book could be entirely indexed, entirely searchable in
terms of reporting what page a word or phrase appears on, and even
showing that in a text snippet, but, if it's the copyright owner's
desire, not have the page image be displayed.
In fact, it's possible books could have 0% of page images available but
be made searchable. COCOA allows decoupling indexing/searching vs.
page image display in a way that is not now possible.
Thus, substantially many more books could be indexed and made searchable,
if only the copyright owner had control of page image display rules --
which is exactly what COCOA does.
COCOA does not inhibit indexing and searching -- indeed, COCOA means
more indexing and searching of copyrighted material.
[Top] - How does COCOA help the blind and visually impaired?
-
An unfortunate victim in all this are the blind and visually impaired.
There's a non-profit, charitable organization called
BookShare.org
that provides books in digital form to those with visual or other
print disabilities. BookShare.org operates under the
"Reproduction for the blind" section of US copyright law,
which says they can offer such a service if they meet certain criteria.
Bookshare offers text-to-speech software, etc. Having access
to the books all these CDA's have scanned would increase BookShare's
collection more than ten-fold -- a gigantic improvement for the visually
impaired.
The problem is that Google and Amazon won't provide their scanned images
to BookShare.org because of all the problems flying around. I've been
told that if we can solve this problem -- to which COCOA is the solution --
then the roadblocks in BookShare's path are removed.
(In the meantime, copyright owners can use COCOA to provide explicit grants
to BookShare.org; but the real benefit is when the CDA's adopt COCOA.)
[Top] - How does COCOA compare to Creative Commons, etc.?
-
Below is a table comparing COCOA to Creative Commons, Gnu GPL, and Sun's
open DRM (DReaM). In brief, Creative Commons and Gnu GPL are used to
grant certain, very specific rights, primarily focused on making
works freely copyable; DRM (such as DReaM) is a means to enforce rights;
whereas COCOA is a means to (a) specify any rights (not just specific ones),
(b) distribute that information to users; (c) verify the authenticity
of a license. COCOA can be used to specify CC or GPL rights, but can also
specify many other kinds of rights, both commercial and non-commercial.
COCOA is flexible enough to be used as a DRM system, though that is only
one small aspect of it.
Here is a point-by-point comparison:
|
Feature
|
COCOA
|
Creative Commons
|
GNU GPL
|
Open DRM
| |
Purpose
|
Framework to specify & distribute rights
and license information.
|
Set of licenses; many, but targeted to specific tasks, centered
around free copying for non-commercial purposes
|
Set of requirements to ensure work (typically software) can be
freely used and modified
|
DRM is a means to enforce rights to protect content (generally commercial).
Open Media Commons DReaM is an attempt to make DRM more widely available;
also a set of software to help others create reusable content. DReaM is
not yet implemented.
| |
Flexible rights /
Works with any rights
|
Yes, can specify any kind of rights; has some predefined ones for
common purposes
|
No, each CC license grants specific rights, cannot be extended/modified
|
No, each GPL license grants specific rights, cannot be extended/modified
|
Yes(?), since is software it should allow one to control any digital rights.
| |
Requires software
|
No, can be implemented by humans off-line or via software
|
No, can be implemented by humans off-line or via software
|
No, can be implemented by humans off-line or via software
|
Yes, requires software to enforce rights; intended for use embedded in
devices
| |
Operational
|
Yes
|
Yes
|
Yes
|
No, still being designed
| |
Helps locate owner
(to negotiate other uses)
|
Yes, maintains contact database for rightsholders
|
No; identifies author but offers no contact information
|
No; identifies author but offers no contact information
|
Not a designed purpose, but may be able to use for this
| |
Maintains searchable database of content
|
Yes; any content can be included
|
Only CC content
|
No
|
Not a designed purpose
| |
Sampling
|
Yes, allows fine-grained control and offers machine-readable
definitions of how to create samples
|
Limited; no definition of what constitutes a "sample"
|
No
|
Should be possible to write software to allow sampling
| |
Can use to make "public domain" grants
|
Yes
|
Yes
|
No
|
Should be able to
| |
Jurisdictions
supported
|
Can specify any country, list/group of countries, or global.
Works with laws of any country or could be used to specify country-specific rights
|
Applicability of detailed contractual language may vary by country; has some
licenses for a limited number of specific countries
|
No country-specific wording
|
Unclear; difficult to implement in software
| |
Can authenticate licenses
(prevent "forged" licenses)
|
Yes, includes credential information and has server(s) one can use to verify
authenticity of a license
|
No
|
No:
|
Yes; this is the primary purpose of DRM
| |
Layered rights / covers multiple works at once as well as individual works
|
Yes; an author and/or publisher can specify rights for all their works, then
layer overrides on top, for specific groups of works or individual works
(example: all books by a publisher - allow reading 75% for free; all textbooks
by same publisher - deny reading every fourth page; specific book: allow
reading 100%)
|
No; individual work only
|
No; individual work only
|
No; individual work only
| |
Open standard
|
Royalty free - Yes;
Modifiable / extensible - Yes
|
Royalty free - Yes;
Modifiable / extensible - No (creators cannot modify rights language)
|
Royalty free - Yes;
Modifiable / extensible - No (creators cannot modify rights language)
|
Royalty free - Yes;
Modifiable / extensible - Yes
| |
Allows time-limited grants
|
Yes
|
No
|
No
|
Yes
| |
Requires creator to share work non-commercially
|
No, can be used for both free and/or commercial licences
|
Required (is the intent); sampling licences can be used to limit
free sharing to just samples
|
Required (is the intent)
|
No, intended for commercial use, could be used for non-commercial licenses
| |
Requires allowing users to make/distribute derivative works
|
Not required; terms set at owner's choice
|
Not required; terms set at owner's choice
(can choose "no derivatives" licenses)
|
Required (is the intent)
|
Not required; terms set at owner's choice
| |
Requires inclusion of license in content itself
|
At owner's choice (licenses can be embedded in content if desired;
or granted separately, such as to avoid altering existing content or
physical products)
|
Yes, license must accompany content, per the license
|
Yes, license must accompany content, per the license
|
Yes, must be present to be processed by software
| |
Target property
|
Any property (geared toward digital/intellectual property but could
be used for tangible property as well)
|
Artistic/digital creations (text/video/audio/software; inapplicable to
other kind of property)
|
Software, documentation (some applicability to other artistic creations,
inapplicable to other property)
|
Any (though unclear how applies to tangible goods without software)
| |
Both machine and human readable license
|
Yes, both human and machine readable, in XML and plain-text forms
|
Yes, both human and machine readable, in XML and plain-text forms
|
No, human readable only
|
Unclear - Machine readable, unclear if produces human readable licenses
| |
Ability to enforce content access restrictions
|
Yes, at owner's choice, not required
|
No
|
No
|
Yes, Required
| |
Specific grantees
|
Yes, can specify who license is granted to, or grant to the public
|
No, can only grant to the public
|
No, can only grant to the public
|
Yes, can specify who license is granted to, or grant to the public
| |
Allow multiple grantors
|
Yes, content with multiple rightsholders can grant rights at separate times,
and multiple rightsholders can be authenticated via multiple digital
signatures on the license
|
No, all rightsholders must act as if one single rightsholder
|
No, all rightsholders must act as if one single rightsholder
|
Yes (assumedly)
| |
Defines specific algorithms for sampling
|
Yes, includes many useful algorithms to define what a "sample" is for a
given work; other algorithms may be added
|
No; no definitions of what constitutes a "sample" of any kind
|
No; no definitions of what constitutes a "sample" of any kind
|
Not addressed, but assumedly could, since is software-based
| |
Unique ID for each work for tracking purposes
|
Yes (a COCOA-ID)
|
No
|
No
|
Not addressed, but assumedly could, since is software-based
| |
Includes product ID (ISBN, SKU, etc.) for tracking outside the rights system
|
Yes, can include ISBN, SKU, etc.
|
No, generally not included, would have to add manually
|
No, generally not included, would have to add manually
|
Not addressed, but assumedly could, since is software-based
| |
Source of grant
|
Works with any (the COCOA record itself can be the legal source of the
grant, or it can contain
embedded copies of contracts,
pointers to existing contracts,
Creative Commons/GPL/etc. licenses, etc.)
|
Only works for Creative Commons licenses
|
Only works for GNU licenses
|
Not addressed, but assumedly could, since is software-based
| |
Includes content description
|
Yes, can include description of content (and/or sample, to help identify
work)
|
Yes, XML record can include a description
|
No
|
Unknown
| |
Media types supported
|
Any
|
Any
|
Primarily for software and documentation
|
Any
| |
Facilities to link to external systems (such as other rights control systems)
|
Yes
|
No
|
No
|
Yes
| |
Can use license to locate content itself
|
Yes; a license that is separate from its content can point to an
authoritative source for the content
|
Limited; has URL to source but generally doesn't permit licenses
separated from content and lacks authentication to ensure authoritative
source for content
|
Very Limited; creator can include location of source but generally doesn't
permit licenses separated from content and lacks authentication to ensure
authoritative source for content
|
Yes
| |
Allows confidential grants (to allow private grants to be distributed via
public databases, or keeping contractual terms private, etc.)
|
Yes; if desired, any/all fields in a record can be encrypted so are only
readable by grantee
|
No
|
No
|
Unknown
| |
Searchable database
|
Yes, searchable for author, title, type of work, use; entire database public,
to enable any kind of search desired (via own search engine as well as Google,
etc.)
|
Yes, searchable for author, title, certain specific uses (via Google)
|
No
|
Not addressed, but assumedly could, since is software-based
| |
Authentication/enforcement at time of use
|
Yes, if desired
|
No
|
No
|
Yes, is primary purpose
| |
Allows rights to vest with person, not just device
|
Yes
|
Yes
|
Yes
|
Yes (typical DRM vests with a device, DReaM can vest with a person)
| |
Level of legal binding
|
Can bind legally, voluntarily, or combination
|
Legally binding only
|
Legally binding only
|
Not addressed, but assumedly could allow either, since is software-based
| |
Dynamic licenses (change over time, e.g. to allow subscriptions or other
conditions)
|
Yes
|
No
|
No
|
Yes
| |
Work with rights other than copyright (e.g. patents, contracts)
|
Yes, any rights
|
No
|
No
|
Not addressed, but assumedly could, since is software-based
| |
Interoperability
|
Can work with other licensing systems so long as it can be described in
plain text (such as Creative Commons licenses, GNU); can work with others
e.g. DReaM, if users have the other systems' necessary software installed.
|
No
|
No
|
Unclear; may allow writing modules for other license frameworks.
|
For more information, visit...
[Top] - How would I use COCOA in conjunction with a Creative Commons, GNU, or
-
other kind of license?
Very simply: When filling out the form to create a COCOA record,
(1) choose Grant Source as "Per Creative Commons or GNU license" and
(2) include a copy of the license text or URL pointing to it.
Common Misconceptions
[Top] - Won't COCOA prevent a CDS from indexing and searching books?
-
Basic answer: NO. Adopting COCOA will not prevent indexing and searching.
(It will, in fact, cause an increase.)
This is a slightly more complex question because the AAP and AG have
sued Google over this, so you'll have to pay close attention here:
Because of these lawsuits, it's now for the courts to decide if it's
legal for Google to (a) make an index of all the words in a book,
(b) make that index available online, then (c) present (say) 20-word
"snippets" of the original text as part of the search results. (How
the courts will rule I can't guess. As
Charlie Petit notes,
"One of the rights of copyright holders is the preparation of indices of
their works." So the courts could rule that Google can't index copyrighted
material without permission. However, the courts might also find that they
can -- VCR taping was considered illegal until Sony challenged and won.
So who knows?)
This is NOT a statement of desirability, whether I want that to be legal
or not -- it's a statement of fact that this is being litigated,
and the resolution is not clear.
Now.
COCOA has nothing per se to do with preventing indexing. Let's suppose,
purely hypothetically, that the courts ruled that Google's indexing/searching
was legal. Play along with me. The problem still exists, and COCOA is
definitely about who has complete copies of books. So indexing and
searching don't require storing complete copies of books:
You don't need to make a copy of a book to build an index. You just write
down every page where every word occurs. Maybe you write down a few words
of context around each word. But you don't have to store all those words
in order of the book itself. This is not splitting hairs: A file with
all the words that appear in this web page, listed in sorted order, "a a a a
a ... aardvark aardvark ... about about about", is NOT the same as this web
page.
You don't need to make a copy of a book to return a search result, such as,
"the word 'aardvark' appears on page 23 of this book." You could do
that entirely without ever making a copy of the book. (You could make
a list of words and what page they appear on: 'aardvark', page 23, 42, 77...
'about', page 18... 'the'...)
You could return "snippets" (say 20 words of text) around every word
as search results without storing a verbatim copy of the book.
How? You could have a database that looks like this:
...
aardvark:
page 23
"aardvarks are fun and frolic in the sun" [20 words max]
page 42
"he gave the aardvark a cookie"
page 77
"the aardvark ambled about the apartment"
...
about:
page 18:
"Bob said, "That's about it," and the gate opened"
...
...
the:
...
That doesn't involve storing a copy of the book. That's an index.
So a CDS could do all the above without making a copy of a book.
(Whether Google or Amazon etc. do it that way is another question.
But remember we're working on the hypothetical assumption for the moment
that indexing/searching itself is found legal.)
The serious problem arises when a CDS allows others to use their
system to make a copy of copyrighted works without permission.
It was originally possible to extract a complete text copy
of a book from these systems by searching on certain words then building
a chain of all the words in a row -- extracting the whole book. When we
demonstrated this, they made changes to prevent it. So that hole is closed,
and not an issue as far as I'm concerned. (Any future CDS would have to
likewise prevent plain text copies from being built, of course.)
However, what CDS's also do, is let you see the scanned
images of those pages. As documented elsewhere,
it's extremely simple, and easily automated via software, to extract all
those page images. That gets you the whole book -- a copy that wasn't
authorized by the copyright owner. (And it has nothing to do with indexing
or searching.)
THIS is what COCOA addresses. Page images. COCOA lets the copyright
owner specify which page images can be shown. From 0% to 100%, in assorted
convenient ways, which should have the net effect of making more pages of
more books available today. But nothing per se to do with indexing or
searching.
Now, that's not quite true, as COCOA does relate to the lawsuits over
scanning/indexing/searching/snippets in the following two ways:
(1) If scanning-to-index-and-providing-snippets is ruled illegal [that's
IF! folks, IF!], then Google will have to obtain
permissions to scan-to-snippet, and copyright owners could choose to
use COCOA to grant scan-to-snippet rights.
(2) With the legality of scan-to-snippet up in the air, copyright
owners could use COCOA today as a simple way to grant that right, if
they wanted Google to scan/index their work without it being fuzzy if
they have the right; if it's later proven a piece of fair use, then it's
moot and harmless.
But, by and large, COCOA has nothing to do with indexing, searching, or
showing text snippets. COCOA has to do with showing scanned page images.
Indeed,
COCOA will increase the amount of copyrighted material that can be searched.
[Top] - Isn't browsing a book on a CDS the same as free bookstore browsing or library borrowing?
-
It's different for these reasons:
(1) The library paid for the copy you're borrowing. (Or somebody
paid for it, in case the book was donated to the library.) Thus the author
was paid for that copy. If you read a whole copyrighted book via a CDS
and never buy the book, the author wasn't paid. Copyright law is about
creating new copies; you're not creating a new copy when you read
in a store or from a library.
(2) Browsing in a bookstore is pretty inconvenient. You can't take the copy
with you to look at any time you want. (Unless you buy it! That's sort of
the point.) Bookstores know that few people really read entire books
in the store -- else they'd go out of business. However, reading a
book from a CDS doesn't have that limitation: You can take it
with you, on your laptop, etc. This is particularly critical in light
of digital paper, when the digital copy is the paper copy.
(3) Libraries and bookstore reading isn't anywhere near free: You have to
move your physical body to the bookstore to read. For one thing, you can't
likely do that at 3am. (And certainly not in your pajamas.) You can't
do it from your bed, couch, or desk, without getting up. You have
to spend time to move your body down there, which might be
10min-30min each way; 20-60min round trip, plus say 10min to find the
book, a place to sit, etc; call it 30-70min. If you value your time
at say, $10/hr, that's $5-12. Then there's the cost of transportation.
If the library/bookstore is three miles away, 6mi. round trip, and gas
costs $2.50/gal., and you get 20mi/gal., that's another $.75. The IRS
figures driving a car costs $.405/mile in repairs, wearing it out, etc.,
so that's another $2.40. So you're at something like $8-15 to go read a
"free" book.
Really -- if it were that free, people would do a lot more of it.
Yet reading a free copy from a CDS doesn't have those limitations. It is
much closer to $0, actually and truly free. THAT's the problem.
(4) Your library or bookstore might not have a physical copy on hand.
Then you're either spending more time/money to move your body to another
library/bookstore that has a copy, or you don't get one. Whereas, a copy
can always be available online. Another example of libraries "costing"
more to borrow from.
(5) You can't pass on a "free" copy you read in the store or from the library.
You have to leave the book at the bookstore (or buy it); you have to return
the book to the library. Reading a book in digital form that was stolen
from a CDS, you could pass that copy on to others by email, via a web
page, P2P software, etc.
So, bottom line, bookstore/library reading is fine and dandy, since
it isn't really free. CDS copies are essentially free, and that's
the problem. They're too convenient to read free. While this may
not be an economic harm today (and may in fact generate sales of print
copies of books, today), the concern is for the future, should most reading
be done digitally, when illegally obtained free copies are then as
valuable as legally obtained ones.
Critics Addressed
[Top] - McDaid's humorous parody -- funny, but misses the point
-
I enjoyed John McDaid's
a humorous parody
of the COCOA call to action,
though of course it entirely misses the point :-), that having a copy of a
book to read that you purchased is precisely the legal right paid for with
the purchase (or that a library purchased, or that somebody purchased),
whereas there's no purchase nor conveyance of rights if someone uses software
to obtain a copy from a CDS's database of images.....
[Top] - BoingBoing bounces off-target
-
BoingBoing
perpetuates the McDaid link, and goes further astray by decrying COCOA for
preventing indexing and searching.
As noted above, COCOA does not prevent indexing or searching or the display of text snippet search results.
Cory Doctorow said in private email that he would amend his
statement (though hasn't yet).
Miscellaneous
[Top] - What's a "copyright liberal" vs. a "copyright conservative"?
-
A "copyright conservative" would be someone who is highly protective
of copyrights and defends them vigorously; I'd say Harlan Ellison is
an example. A "copyright liberal" is someone who is open to trying
things and not as worried about piracy, etc.; I'd say someone like Eric
Flint of Baen Books (creator of the Baen Free Library) is an example.
(In the context of the make up of the group that drafted COCOA,
Harlan Ellison's attorney and Eric Flint are both on the committee.)
[Top] - Isn't it legal to read a CDS copy if I own a print copy?
-
That's an interesting question. I don't know if it's been litigated yet.
I see both sides of this. Pro: If I already own a copy, this is like
me making a portable, personal copy for quick use elsewhere. Con: If
you buy a paperback copy, you aren't entitled to a hardback copy for free.
A CDS copy of a book adds a lot of features, like searchability,
cut&paste, reading via ebook reader (or future digital paper books)
that you didn't pay for when you bought a paperback. Some of those
might be ruled fair use by the courts, but, as I said, I don't think
these have been litigated yet. Amazon.com is allowing paid access
to online copies of books you've bought, for a price. So all this is
up in the air right now.
Regardless, a CDS can't make an online copy of a whole copyrighted book
available to those who haven't paid for it, without permission of the
copyright owner.
[Top] - Is COCOA hard for CDS's to implement?
-
It should be really simple for them. Basically COCOA identifies books
and pages within those books -- what can be shown, what can't be.
It should be a simple matter to prevent the blocked page numbers from
having their JPEG scanned images appear in their database of page images.
("If (page_is_not_blocked(page_number, book_id)) show_page()...")
[Top] - What rights are involved in CDS displays?
-
Well, here's a problem area, as this is not clear. All rights originate
with the author of a work. The author then signs a contract conveying
certain rights to a publisher; but rarely do they convey all rights.
Typically a book contract will call for "electronic rights", meaning the
publisher has the right to produce an electronic edition of the book.
However, what rights are actually conveyed to whom is in no way standard.
For example, an author of a short story in an anthology may not have sold
electronic rights to the publisher of the anthology, yet the publisher may
have conveyed electronic rights to the entire anthology to a CDS for display.
Another avenue some CDSs have mentioned is the use of marketing rights.
That is, using small portions of a work in order to market the whole. This
could be problematic for a CDS when the majority or entirety of the work
can be viewed by multiple users.
Some publishing contracts with authors spell out specific rights for CDS usage.
Authors may wish to include specific language in their contracts
so there are no miscommunications. References to COCOA-created grants may
also make sense in this context.
[Top] - How come people signed the petition multiple times?
-
They really like it!
But seriously, it's a function of browsers and the petition site.
If you go 'back' in your browser or reload the signing page, you
risk getting signed up twice. There aren't that many duplicates, and
they're very obvious if you take the whole list and sort it by name.
[Top] - What can I do to help?
-
Urge Google, Amazon, Microsoft, Yahoo!, the Open Content Alliance, etc.
to adopt COCOA. The two best ways to do this are:
1) Please SIGN THE PETITION at:
http://new.petitiononline.com/cocoa/petition.html
It's worded for brevity; the details are at the COCOA web
site:
http://www.CopyrightAccess.com
2) Please SPREAD THE WORD: Urge others to sign the petition,
learn about COCOA, and likewise encourage others to sign
the petition, spread the word, and urge yet others to..........
Please post on your blogs, tell journalists you know,
put links on your web pages, etc.
THIS LINK has a brief 'call to action'
you can copy and spread around. Thanks!
|
|
|