|
Solving Copyright Woes with a Cup of COCOA
Draft of October 16, 2006
Andrew Burt
Institute for Digital Security
P. O. Box 16143
Golden, CO 80402
aburt@cs.du.edu
Introduction
The Authors Guild has sued Google, Inc. for "massive copyright infringement"[1] because of their Google Print program; specifically Google's plan to scan the pages of every book in several large libraries, without the permission of the authors or publishers -- the copyright owners. Then the Association of American Publishers sued Google over it. The Authors Guild has also issued warnings[2] about Amazon's Search Inside the Book program, where Amazon has also scanned hundreds of thousands of books to make them searchable to customers and provide them the ability to read the scanned pages. This project is also on sketchy copyright grounds. These issues are being discussed and negotiated at the highest levels; the author of this article is the lead negotiator for SFWA, Science Fiction and Fantasy Writers of America, Inc., dealing with executives at the highest levels of Amazon, Google, the AAP, AG, etc.
These projects, and others, such as from Yahoo!, Microsoft, and the Open Content Alliance (OCA), are wonderful uses of technology. They hold the promise of making more content known to more people, and, from an author's perspective, of selling more books.
However, while these hold great promise, there are some significant problems that must be overcome. We'll explain what those are, differentiating them from what Google et al. are trying to do. Lastly, we'll present a solution to these problems -- COCOA, the Copyright Owners' Control of Access standard. COCOA solves not only these problems, but addresses other similar problems, and has potential to greatly increase the amount of searchable, scanned page images of copyrighted text on the Internet.
This is Different from the Authors Guild and AAP Lawsuits
The AG and AAP have sued Google over the question of whether it is legal to make a copy of an entire book for the purpose of making it searchable and displaying small "snippets" (text or page images) as search results. That is not the question at hand here. The question here is not so much whether scanning in a book is acceptable, but what one does with the scanned images.
It's for the courts to determine whether it's legal to scan a copy which is not then available to others in toto, but only visible in very small quantities. This is what Amazon and Google purport to do; it's certainly what their goal is.
The same applies to returning text search results, snippets of the text itself in plain text format, as opposed to graphical images of the pages. To accomplish this Google and Amazon have run optical character recognition (OCR) software on the scanned images. Again, this is for the courts to rule on.
Thus, depending how the courts rule, it may turn out that scanning an entire book and displaying a small number of page images and plain text will be found acceptable. The emphasis is on the small number, and that's where the problems arise.
Partial Solutions
Both Amazon and Google attempt to limit what visitors can see, with the intent that one person can't get more than a small amount of a copyrighted work. Once one has located a page, both limit clicking the 'forward' and 'back' icons to two more pages, for a total of five pages in sequence. Amazon has a total limit around 15% of pages in a text that one customer can see (per credit card number). Google takes a different approach and omits approximately 15% of page images from the database, apparently randomly selected but weighted toward the end: Nobody can see these pages no matter what.
Both make attempts to prevent users from downloading page images, but since the JPEG files sit in the user's browser cache directory just to be viewed, for example, these preventive measures are little deterrent.
Problems
The overriding concept is that you can't make more than a small amount of a copyrighted work available without permission of the copyright owner.
There are several ways in which Google, Amazon, etc. run afoul this legal concept:
1) Playing one off the other. It's possible to obtain 85% of the page images of a book from Google, then use one's 15%-per-book limit at Amazon to obtain the rest. This will only get easier as more sources are available, such as Yahoo! and the OCA's. It's likely other major players will enter this market, such as Microsoft, AOL, Barnes & Noble, Ask Jeeves, etc. Sum up the per-user percentages each allows, and if it's 100% or more, it's trivial for one person to obtain all the page images of any book. It's already 100% with just Amazon and Google.
Furthermore, this can easily be automated. (We've already done it. It wasn't difficult.)
2) A few dozen peer-to-peer nodes, five pages each... We've already demonstrated that even if Amazon could limit each user to five pages per book (a rather severe limit), for a 300 page book it only takes 60 file-sharers to grab five pages and share with the rest to extract all 300 pages. Compared to the at least 200 million users of P2P software[3], it shouldn't be difficult to find 59 "friends." We've already demonstrated the efficacy of this attack, and again, it's easily automated in software.
Amazon et al. cannot know who the pirates are at five pages apiece, but to be absolutely certain, each page can be viewed by two users, one who downloads and one who only looks (thus either 10 pages for 60 users or 5 pages each at 120 users). Amazon can't press legal charges against all downloaders of page images; they have to demonstrate which are the pirates, and this approach makes that impossible.
The number of colluding peers using some kind of "Amazonster" software would in practice be substantially fewer, since the software would naturally take advantage of all the sources for page images -- five pages from Amazon, another five from Google, another five from Yahoo!, etc., and each user can be contributing 15+ pages. A 300 page book with three five-page sources -- current and very conservative numbers -- it only takes 20 peers to share a whole book.
In reality, it's unlikely Amazon could detect users as pirates if they downloaded ten pages each, dropping the total to 10 peers needed. (20 for maximum safety via double downloading but only one sharing.)
There is no way to detect "Amazonster" downloads from any ordinary user browsing the site. A simple approach is to use the VBScript language (provided in every copy of Microsoft Windows) to control Internet Explorer to simulate a user browsing book pages (complete with human-like time delays). The human user of the Amazonster software would use their regular browser session to log into Amazon, Google, and Yahoo as needed, once per session (i.e. once per book or multiple books), or they could trust Amazonster to do it for them. Since it's identical behavior to an ordinary user browsing five pages, there is no abnormal behavior to be detected.
Once a book has been extracted, it can be compiled into a PDF or ZIP file for use by others on conventional P2P networks. (Page images run around 100Kbytes each, giving a 30Mb file size; a small file by P2P standards.)
3) Existing limits are insufficient for many kinds of books. Even ignoring attacks #1 and #2, limits such as 15% of all pages or even five sequential pages are not sufficiently limiting for many text and reference books. Even forward thinking publishers such as O'Reilly Associates do not widely participate in Amazon's Search Inside because they would prefer alternative limiting methods, and to have per-title control to handle special cases.
Short fiction anthologies are another example. With a 15%-per-book limit, even a five-sequential-page limit, many short stories in an anthology could be read in full. For authors this is often completely unacceptable, as they typically sell electronic rights to for-profit web sites such as fictionwise.com. Being able to read that work for free means that entire anthologies must be removed from the Search Inside program, since Amazon does not allow per-page blocking of page images; they're all or none.
Even the example cited in the Wall Street Journal[4] contains a link to a Google Print anthology of science fiction author Jack Vance, where the link points to a three page short story -- readable in entirety with just two 'page forward' clicks.
A flexible system would allow for generic, algorithmic limit setting, such as "the following pages from this ISBN#" to "block the last 25% of novels and block every 4th page from textbooks."
The current one-size-fits-all approaches are insufficient, with the result that a huge number of books are kept out of the programs entirely.
To remove these restrictions and allow specifications down to the ISBN level would bring needless complexity if each of Amazon, Google, Yahoo! etc. all had their own mechanisms for specifying the allowed pages for viewing, hence the need for a unified protocol.
4) Existing policies too low in many cases. Many publishers want more visibility than Amazon and Google allow. A number would like to have 100% of a book's pages readable by any user. One publisher wishes to do this with nearly all their titles, restricting a few pages here and there for specific rights reasons (i.e. contracts that prohibit it). Amazon's structure is too rigid to allow for this. Google's structure technically allows for it (as they block specific pages now), but there's no mechanism for specifying which books have what visibility, and to designate which pages to block. Thus, copyright owners' desires are being altered and denied.
5) Digital paper. Or, alternatively, digital ink (or any other
innovation that leads to most reading being done digitally). It appears likely that within a decade or two there will be products that look and feel like ordinary paper but display dynamic content. Philips and E-Ink already have a promising technology like this in production.
While the economic impact from book piracy today is largely in the form of counterfeit copies of books produced in third world countries, copyright owners are concerned that true digital paper or other scenarios where most reading is done digitally could enable more piracy of books worldwide. That is, the fear is that when there is no discernable difference between a print book of today and a digital book -- an ordinary book that is simply digital -- then there's no economic incentive for readers to pay for content that can be had free and easy via P2P networks. This could be a significant economic shock to the publishing industry, a $30 billion/year industry.
While there are techniques to solve the piracy problem[5], they cannot operate when the sources of the stolen material are legitimate companies such as Amazon, Google, and Yahoo!. This problem could be solved today, by ensuring that legitimate companies are not sources of pirate data.
6) Opt Out is never friendly. Google Print's approach to copyrights is "we'll take your content unless you tell us not to." Most people hate it when they have to take an action to prevent something unwanted (opt-out) rather than having to authorize the action in the first place. "We'll bill you $100/month unless you tell us not to" is an unfriendly business practice, albeit a successful one, unfortunately.
Copyright law is clear, however: Copies are illegal unless specifically authorized, either by the owner or by law. Opt-in. Recording a show on your TiVo is authorized by law. A publisher printing copies of an author's book is authorized by the author, via a contract with the publisher. Google Print runs into difficulty here, by having neither legal backing nor author permission.
Providing snippets of text or a small number of pages is likely legal. The problem is that Google, Amazon, etc. allow more than these small bits, as noted in problems #1-3.
Opt-out is neither friendly nor legal.
7) Finding the actual electronic rights owner is tricky. Amazon relies on contracts with publishers; Google relies on a mix of contracts with publishers and opt-out grabs. A problem with both is that the publishers may not, themselves, have the rights to convey for a book to have all it's pages up on the web.
Either a publisher's contract with the author may not allow for electronic rights, or in many cases, notably with Google Print, the book may be out of print with all rights reverting back to the author. The publisher in such a case is no longer authoritative, yet this is who Google and Amazon deal with. (At this writing it's too early to tell about others, such as Yahoo!.)
Copyright law is clear: No copies without permission of the current copyright holder. To be legal, Google must do research on each and every library book they scan to determine who the rights holder is and ensure their permission.
A solution to this problem should enable authors as well as publishers to designate which pages of their books they're comfortable with Amazon, Google, etc. displaying on their web sites, be that 0%, 100%, or in between.
8) Finding any copyright owners can be very difficult, particularly for older works[6]. The Copyright Office is currently looking into the "Orphan" copyright problem, and we've proposed a methodology for addressing this [7]. There is no central repository of contact information for copyright owners. Web pages, for example, are owned by whoever wrote them, but rarely are they registered with the Copyright Office. Nor does the Copyright Office obtain changes of address, making it very difficult to find authors who have moved or died. Tracing the current ownership of rights gets particularly muddy in cases of death and corporate bankruptcy. Yet if the proper permissions are not obtained from the rightful owner, the owner may sue for damages that run as high as $1,000,000, depending on how the content was obtained.
Solution
A solution to all the above problems is to have a one-stop shop where copyright owners could grant permissions to use their works to organizations like Amazon, Google, etc. This is what we've developed, and dubbed the COCOA Protocol.
COCOA was designed by an industry committee representing a broad mix of skills: authors, editors, publishers, rights experts, copyright law experts, programmers, etc.; as well as people spread across the spectrum from "conservative" to "liberal" with regard to how copyrights should be handled. The hope is that when we come to a consensus with this diverse group, it will represent a position that will be widely agreeable.
The charge of the committee was to craft a simple process whereby copyright owners, from individual authors to large publishing organizations, can specify the visibility of scanned pages for technologies like Amazon's Search Inside the Book, Google Print, and others that enter this space.
Thus, if one needs to block a certain short story in an anthology for rights reasons, all one needs is one quick visit to a secure web page, then Amazon, Google, and everyone will be informed and can block those pages. Likewise, it will be used for publishers to establish default visibility standards (e.g. "unless otherwise specified, block the last quarter of novel pages from view").
COCOA was also designed to use a uniform format with other media in mind, such that music and video files could be described in terms of what amount could be played for free (e.g., first 30 seconds of all songs from a certain publisher or 100% of a specific video).
To this end, COCOA specifies:
- Who is making the request (e.g. publisher or author) and a way to verify they are who they say they are
- What works are covered (e.g. a specific title, or all of an author's work, or all books by that publisher)
- Which parts of the work may be shown (in a flexible way), e.g. specific pages, "first half," "block every third page" etc. Other methods for specifying what to make visible/block can be added as they are thought of.
An author or publisher would use a web page to create the file of information. That information would then be available to all organizations that display content.
COCOA uses a cascading priority structure, thus it handles both the "default" approach a publisher might choose for all their books (let's say, "block every third page in our books" or "block the last third of novels and every third page of text books") and handles the information from a single author (Pat Author specifies, "for my books, show only the first 20%" but for my novel The Foo and the Bar, you can show the first 75%"). The more specific cases override the more generic, so author overrides publisher, etc. Thus a publisher default at priority 500 is overridden by an author default of 50 or a single ISBN record at priority 1.
Thus, a publisher could establish a setting for all their books with just one entry, in a secure manner; an author could cover all their books with one (secure) entry; and it allows a single book to be set differently.
COCOA allows for many different styles of blocking pages in books (or other content): It allows for showing none of a book, 100% of a book, specific list of pages to show, and for several common styles such as "block every 3rd page" (or every 2nd page, 4th page, 10th page, whatever), "block all but the last 1/3" (last 25%, last half, only show the first 10%, you name it), as well as named styles such as "Amazon standard visibility" or "Google Print standard visibility" for those who are comfortable with that. COCOA allows for adding additional schemas.
Technical details
A COCOA "record" would contain the following information:
Version of protocol
(so new functionality can be added without breaking existing implementations; for example, Version 1 may only cover text, while audio and video could get added to Version 2.)
COCOA Record ID
A unique number assigned to each record to easily identify them.
Operation - "Grant" or "Revoke"
What is being done: "Grant" means a Grantor is granting a certain use to a Grantee. Revoke means the grantor is revoking the use previously granted (assuming the original grant allowed for revocation).
Grantor
Identity of copyright owner granting the Use. Could be a publisher who has a contract to publish the work and who has the necessary rights to convey, could be the author of the work if they have not conveyed the rights exclusively to a publisher, could be an agent for the author acting on the authors behalf, etc. In the case of public domain material, the Grantor field would indicate the entity who has researched and demonstrated the public domain status of the work.
Credentials
Authentication information for the Grantor of this Use. Envisioned to use industry standard Public Key Infrastructure, which is highly secure and widely deployed.
To obtain credentials, an entity may supply a legally demonstrable digital identity (such as an X.509 certificate from a certificate authority that requires notarized proof of identity), or go through another credentialed entity -- thus it is envisioned that most authors will demonstrate the authenticity of their requests via credentialed Authors Groups (SFWA, AG, etc. etc. -- who verify member credentials to join) or via their Publishers. (For example, an authors group such as SFWA or publisher like Random House would serve as a point of dissemination of passworded access to the COCOA system.)
Which authors groups and publishers are considered credentialed would be overseen by an industry board with representatives from the above mentioned stakeholders. This COCOA Oversight group would also oversee operations of the secure web site.
Requests for access from authors who are neither members of a credentialed authors group or publisher (of which there should be few, but may include authors whose rights have reverted, estates of authors, etc.) would have their requests for access carefully verified by the COCOA Oversight board.
Grantee
To whom the Use is granted. Grantees could include
- The Public
- "All content displayers" (Amazon, Google, others)
- A list of Individual entities (Amazon, Google, etc.)
The grant may be made to any entity or group of entities. New entities may be created by anyone for the purpose of being a grantee. (No credentials are needed for a grantee, only an unambiguous description, though grantees may obtain Credentials if they desire. Grantors, however, must have credentials to make a grant.)
For use in contacting the rightsholder to obtain permissions to use their works, note that the Grantee may include public contact information for the grantee (such as an author's agent), or it may be obtained via the site providing authentication of the grantee's identity.
Record Date
The date this grant/revocation record was made.
Effective Date
The date this grant/revocation takes/took effect.
Request priority
Since different stakeholders for a given property may be granting uses, and stakeholders wish to grant uses in the easiest possible way to whole categories of properties, this Priority field will establish which grant applies to a given property if there are more than one applicable. Thus if for a specific book there are two grants found, one for "all books by Random House" and one for "ISBN #0123456789", the first would have a lower priority: The ISBN# grant would have priority 1 and override the priority 500 for "all books by Random House" for that specific ISBN.
Thus this is a "cascading" priority.
Proposed priorities levels are:
- default=999 (For any work not otherwise described)
- multi-publisher=750 (e.g. if several publishers agree to one approach, that approach could be given a name)
- publisher=500
- publisher+subcategory=250 (subcategories could be e.g. "novel", "anthology", "textbook", "reference" etc.)
- author=50
- single title=10
- single ISBN=1
Additional ones can be defined as needed.
Duration
The grant described would be good for the length of time specified. Such as:
- Until mm/dd/yyyy
- Until revoked
- Perpetuity (e.g., for public domain work, or if this grant is for selling a book to a reader, who is thus allowed to read it ever after) [Perpetuity would nonetheless always be limited by laws or discovery of errors]
Realm
Where the grant is effective, e.g.,
- Within the US or another specific country
- Common group of countries, such as the EU
- List of countries/groups
- Worldwide
This addresses concerns such as public domain works in one country still being under copyright in another, etc.
Source of grant
A description of what legal document is conveying the grant.
- Per external contract (described as ____)
- Per a Creative Commons license
- Per law (such as a citation to US Title 17 for public domain works)
- Per this COCOA record (that is, COCOA itself could be used, if desired, to be the granting document)
Compensation
A description of compensation (if any) to exercise the grant.
For many uses this would be "none," such as for public domain work, or granting Amazon or Google the right to use certain page images, or putting a work out under a Creative Commons license.
For generality, however, payment models could be embedded here to describe any payment required. For example, a publisher or book seller could specify the price for an electronic edition of a book a consumer would pay, or a newspaper could specify prices for accessing archival data, a self-published author could specify how to purchase their book, etc.
A single product could thus have multiple records, granting different uses to different parties, some free, some requiring payment.
Payment would be specified as an amount and a vehicle, such as via credit card, PayPal, check sent to a postal address, etc. or a list of such.
What is being granted -- description of the Content and the allowed Use:
Content description:
Media type
Paginated text, non-paginated text (e.g. web page), audio, video, etc.
Media sub-category (perhaps for later protocol version)
Group of like works to apply algorithm to, based on standardized categories of works, e.g., all fiction, novels, anthologies, text books, reference books, cook books, etc. -- in print vs. out of print
Has to refer to some standard else it won't apply uniformly and one work might get put in two different categories.
Purpose would be so a publisher could say, "for all my textbooks, use the following approach."
Product Identifier
Such as ISBN, SKU#, GUID, or other description that unambiguously identifies what product or set of products is described. Identifiers for sets of products will include "all works by grantor", "all works by grantor of indicated sub-category", etc.
General description of content
Text description of the content for which use is being granted, such as "a novel about first contact with aliens." This has particular value in identifying "orphan" works. For entire-author grants, this may include a description of the general kinds of work the author produces.
(optional) Sample of Content
For use in identifying "orphan" works, the record may include, for example, the first 100 words of the text, thumbnail images of book covers, movie trailers, audio samples, thumbnail images for photographs, etc.
(optional) Anti-Piracy Content
Data for anti-piracy / copyright infringement detection systems may be included here.
Allowed use(s):
General description of use
A textual description of the allowed use, such as "allowing 100% of this novel to be visible on Amazon.com's Search Inside the Book."
Most of these descriptions would be generated by software based on the options chosen in setting up the record. The text could then be edited, if desired, by the grantor.
Access algorithm type
If there is a convenient way to describe what kind of access is allowed, it could be described in an "algorithmic" way (that is, in a manner a computer program could automatically use). This is important for entities like Amazon.com or Google Print to use so they could have their software calculate which pages of books would be visible on their site, without manual intervention on their part to set up a book in Search Inside or Google Print.
It would also allow descriptions of music or video samplers that sites could use ("allow sampling first 20 seconds" of a song, etc.).
Paginated text (e.g., books, magazines):
specific list of pages to allow
algorithmic based on table of contents (e.g. block last 1/3 of each entry in the Table of Contents)
algorithmic based on entire book (e.g. block last 1/3 of book, every 3rd page)
allow any page but with per-customer viewing limits
as future expansion: rectangular coordinates of content to block on a page (e.g. to blank out half a page, or a photograph)
Non-paginated text (e.g., web pages):
specific list of bytes to allow
regular expression from-to?
algorithmic based on table of contents (e.g. block last 1/3 of each entry in Contents)
algorithmic based on entire book (e.g. block last 1/3 of book, every 3rd page)
Audio
list of time segments to allow ("from 10 seconds in to 40sec in")
algorithmic based on entire work (e.g. block 3/4 of song)
Video
list of time segments to allow ("from 10 seconds in to 40sec in")
algorithmic based on entire work (e.g. block 3/4 of song)
block audio but show video for given time range...?
Access algorithm
A companion data element to the above "access algorithm type". This field specifies what the algorithm itself is, while the above specifies parameters for this algorithm to use.
Algorithms could include
- List (redundant as is implied by algorithm-type, but others aren't)
- Allow first N pages (or seconds, minutes)
- Block last N%
- Block every Nth page
- Allow any page but with per-customer viewing limit of N% and no more than M consecutive pages
- Existing "Amazon" algorithm
- Existing "Google" algorithm
Access algorithms would certainly include those currently in use by Amazon and Google.
Data for algorithm
This delineates any specific parameters the access algorithm needs -- for example, if the algorithm is "Allow first N pages", this section would specify that N is 20, or 30, or whatever is desired.
- page list
- parameters like "N"s above
(optional) Location of content
How to obtain the content itself. This field would allow, for example, a publisher to put page images of their books on the web at a secure location that Amazon or Google could use to obtain the images directly. As noted below under Confidentiality, this could be specified in an encrypted manner that only the recipient could read.
Where desired, this would also allow consumers to directly obtain the content from the indicated provider(s), such as if they've paid to purchase it.
Notes
Confidentiality
Since some grantors may not wish the details of their grants to be public, COCOA shall allow any or all data items to be encrypted. Encryption shall be standard public key encryption, meaning that the encrypted data items would only be readable by the grantor and grantee.
This includes an entire record being encrypted. Only the grantee and grantor would know the record is for them.
Regardless, all records will carry a digital signature of the grantor (i.e., using a standard "secure hash function," which digitally proves the data were provided by the grantor, not a digitized human signature). This authenticates that the grantor has granted this use. (For public domain records, the signature would be of the entity who performed the research to demonstrate that the work is in the public domain.)
Implementation
The COCOA record will be presented using XML, a standard language for describing data.
COCOA records would be entered through a web-based interface and stored in a central database. Copies of this database would be made available to any who want it, as well as allowing queries against the database.
Queries would include, for example,
- "Find all works that grant me, Amazon.com, the ability to display them in Search Inside the Book"
- "Find all public domain books I can read."
- "Find music that allows playing the entire song for free."
- "Find an electronic edition of a book entitled ... and tell me how/whom I pay to read it."
Thus displayers of book images could pull the list of books they can display and compare to the last list to find new works. Additionally, if the page image locations are indicated, Amazon/Google/etc. could immediately download the page images to load into their database.
Possible Sample Output
To set a default for all fiction books published by Penguin:
Version of protocol: 0.9
COCOA Record ID: 123456789
Operation: Grant
Grantor: Penguin Putnam, Inc.
Credentials: XXXXXX
Grantee: Amazon.com Search Inside the Book,
Google Print
Record Date: 2006-01-01
Effective Date: 2005-10-27
Priority: 500 (i.e., per-publisher default for all books)
Duration: 3 years
Realm: United States
Source of grant: External contracts
Compensation: None
Media type: Paginated text
Media sub-category: All books
Product Identifier: All fiction books by this publisher
General description of content: Fiction books published by Penguin
Sample content: --
General description of use:
Allow viewing any page but with per-customer viewing
limit of 30% and no more than 10 consecutive pages
Access algorithm type: Per-customer limit
Access algorithm: Max N% per customer, max M consecutive pages
Data for algorithm: N=30, M=10
(optional) Location of content: none
To block certain pages of a specific short story anthology:
Version of protocol: 0.9
COCOA Record ID: 123456790
Operation: Grant
Grantor: SFWA, Inc. and Penguin Putnam, Inc.
Credentials: XXXXXX
Grantee: Amazon Search Inside the Book
Record Date: 2006-01-01
Effective Date: 2006-02-28
Priority: 1 (i.e., per-ISBN override of publisher defaults)
Duration: 3 years
Realm: United States
Source of grant: This COCOA record
Compensation: None
Media type: Paginated text
Media sub-category: fiction anthology
Product Identifier:
ISBN 0451458788, Nebula Awards Showcase 2002
General description of content: Science fiction and fantasy stories that
have won or were a finalist for SFWA's Nebula Award
Sample content: From "Daddy's World" / Walter Jon Williams:
One day Jamie went with his family to a new place ...
[e.g. first 100 words of each story]
Anti-piracy content: CopyHunter key phrases:
From "Daddy's World" / Walter Jon Williams:
"buildings statues pictures parks"
From "Darwin's Radio" / Greg Bear:
"arm mitch caught glimpses",
"hormonally induced melanosis melanophores"
General description of use: Free visibility for all but last part of
each short story
Access algorithm type: Page list
Access algorithm: Show allowed pages from list of pages
Data for algorithm: List of pages that can be shown:
0-49, 60-70, 94-102, 107-134, 149-207, 239-280
(optional) Location of content: none
Conclusion
COCOA is designed to be flexible, easy to use, comprehensive, and adhere to the copyright law precept that it's the copyright owners who specify what's visible, not non-copyright owners such as Amazon or Google. With COCOA in place, publishers and authors can place their work into search engines such as Amazon with full confidence that only what makes sense to show of their work is what will be shown. Google, Amazon, et al. will be able to scan, OCR, index, and return search results for all such works.
Given the level of confidence this will bring to authors and publishers, the amount of copyrighted material available in such search engines will vastly increase, to everyone's benefit.
|