Deep Learning: A Sport of Kings?

The big news in the machine learning/deep learning world this week is Google’s release of TensorFlow, their deep learning toolkit. This has prompted some to ask: why would they give away “crown jewels” for such a strategic technology? The question is best answered with a machine learning joke (paraphrased): “the winners usually have the most data, not the best algorithms”.

Neural networks have been around for a while, but it’s only been within the past 10 yrs that researchers have figured out how to train networks with many, many layers (the “deep” in “deep learning”). That research has been greatly accelerated by using GPUs as very high-performance, general purpose, vector processors. If a researcher can turn around an algorithm experiment in a day (vs 3 months), a lot more research gets done.

But as the joke suggests, it’s all about that data: you need lots and lots and LOTS of data to train a high-performance deep learning network. And Google has more data than anyone else —so they don’t worry so much about giving away algorithms.

(Also, Google, Baidu, Twitter, Facebook, etc. are investing in GPU compute clusters that can only be described as the new “mainframe supercomputers”. Sure, you can rent GPU instances on Amazon, but there’s nothing like having the latest Nvidia board with lots of RAM and very high-performance interconnect).

What does this all mean for early stage startups? The situation creates several tough hurdles: first, freely available code and technology from Google (and Facebook) enables competitors and devalues whatever the startup might develop. Second, few startups have access to a large enough proprietary data source to compete at scale. And third, GPU compute clusters need real capital.

What’s left for startups? I see at least two interesting patterns:

  • Using deep learning as a key feature to enhance another app.  Use freely available technology to add magic.  Google Photos is a great example of this, and I think every photo and video app will soon be able to recognize stuff, people, people, items, etc. to enhance the functionality.
  • “Man-teaches-machine”.  Start out with a lot of humans doing some task and capture their work to train a network.  Over time, have the network handle the common cases, with the exceptions / ambiguous cases routed to humans for resolution.  Build a large, proprietary training set, enjoy compounded interest, and profit.

The GPU Overshadows the CPU

Ask a teenager about GPUs (Graphics Processing Unit) and you might get a surprisingly informed response.  As I watch my kids, nephews, and their friends build “gaming PCs”, they all seem quite current on the relative performance of AMD vs Nvidia, the merits of GPU memory, power issues, etc.  (And one important side effect:  a fairly healthy family ecosystem of hand-me-down GPUs).

While it’s great to run Battlefield 4 at 60fps on ultra detail across three HD monitors, what’s most interesting is how GPU capabilities are generalizing beyond graphics. This is one of my absolute favorite disruption patterns: “commodization+crossover”, where a technology is commoditized by demand for one application and then applied elsewhere.

GPUs began as very specialized (and expensive) 2D & 3D hardware accelerators. Things began to change in the 1990s, driven by demand for 3D games, first with arcade units and consoles, and then PCs. In 1999, Nvidia coined the term “GPU”, starting a consumer-driven 15yr+ price/performance ramp with no end in sight.

GPUs are also getting much more generalized.  The first, fairly rigid 3D-transform computation pipelines have gradually given way to more general stream processors.  So called graphics “shaders” are now nearly fully programmable:  GPU developers write compute “kernels” in C-like languages (such as Open GL GLSL or DirectX HLSL) that then run on hundreds or thousands of compute units on the GPU.  And more recent technologies, such as Nvidia’s CUDA and the OpenCL platforms, dispense with the graphics-centric worldview entirely,

Because of their parallel architecture, GPUs have continued to scale while single CPU performance has effectively flattened.  For certain “embarrasingly parallel” problems where a repeated operation is applied to large amounts of data, they are hard to beat. For example, $350 gets you ~3.4 trillion floating point ops/second, 42,000x faster than the original Cray supercomputer!  Amazon offers GPU instances, and even Intel has conceded in a way:  on a modern x86 multi-core processor, almost 2/3rds of the die area is GPU.

It’s not surprising to see GPU horsepower applied to more and more non-graphics applications, such as simulating physics, aligning genome sequences, and training deep neural networks.  I think this pattern will continue, with the GPU firmly entrenched in computing systems as a highly scalable vector co-processor.

Why We Need a Neutral Internet, Exhibit A

I received an email from Verizon a few days ago, stating several FOX channels are no longer available because “Verizon refused to accept an agreement that contained rates that are not in our customers’ best interests“.  Presumably, FOX wanted more than Verizon was willing to pay.  (In cable TV, it’s customary for the cable TV operators to pay networks to carry their content.)  Now, those channels are currently playing a looping video with Verizon spokespeople, urging subscribers to call Cox Media.

Contrast this with Verizon’s stance toward Netflix, where they want the opposite arrangement:  Netflix pays to deliver content over Verizon’s network, citingWhen one party’s getting all the benefit and the other’s carrying all the cost, issues will arise” (Other ISPs share this view and Netflix has entered such an agreement with Comcast & Verizon).
This inconsistent situation is precisely and exactly why we need a neutral Internet.
Payments flowing between ISPs and content providers distorts the market, introduces friction, and shifts control to the ISPs.  Ultimately, it hinders innovation: compare the closed, legacy platforms (cable TV, pre-smartphone cell phones) with the enormous economic, quality-of-life, and strategic benefits of the new, open platforms (the Internet, smartphones).  If standing up a new Web site was as hard as signing up cable TV providers for your new cable channel, or getting a carrier to carry your mobile app “on deck” (pre-smartphone), we’d be a fraction as advanced as we are today.
Allowing business models for legacy, closed networks onto the Internet is a fundamental policy mistake.  If we go that way, how long until:
Verizon is sorry to inform you that {Netflix,Amazon,Battlefield,Youtube,etc.} will be unavailable (or available only at a reduced performance) because [content provider] refused to accept content distribution rates in our customer’s best interests.

Teaching Kids Programming

Getting kids interested in programming is a lot harder than it used to be. I was lucky enough to come of age during the PC revolution. My brother and I would carefully enter multiple pages of BASIC code from computer magazines, and then play games for weeks (making our own modifications along the way).

The problem now is the threshold of “interesting & engaging” has risen dramatically: today’s kids are surrounded by games and applications that have had hundreds of person years of development with gorgeous 3D graphics rendered in 1080p on huge color screens. They all carry personal supercomputers, are never off-line, have all the world’s information at their fingertips, and can download any of ~1 million applications (many for free).

Hello world” doesn’t cut it anymore.

How do we get kids engaged with learning software development, without them first having to spend a month writing code?

Minecraft is a fabulous starting point. (I think it will go down in history as one of the most brilliant games ever.) In our household, it’s the virtual neighborhood playground. Quincy will often get on to play with a bunch of friends after school (with TeamSpeak, so they can trash talk while building secret hideouts, chasing monsters, designing complex contraptions, or just pushing each other off cliffs).

But what’s most interesting is Minecraft is fully programmable with “redstone“, a set of digital circuit components. You can build a combination lock for your secret room (that blows up with the wrong combination), a completely automated train system, or even a scientific calculator or 8 bit computer. It’s fun, it’s play, and it’s something to show off to friends.  And, it’s programming.

Taking the Minecraft a step further, there’s the physical world itself. Between Arduino, Raspberry Pi, and an ever-growing set of easy to use components and modules, it’s never been easier to sense and manipulate physical things with software. You want an alarm that goes off when somebody goes in your bedroom? No problem. Now, let’s enhance it so it only goes off when it’s your sister, and also sends a text message with a picture of the offender.  You’re not downloading that from the app store!

Presale Resistance Syndrome (PRS)

I’ve written previously about presales (e.g. Kickstarter or Indigogo) as a tool for hardware startups.  The model enables risky & crazy ideas that would normally never see the light of day. Most will fail, but some will get through and be hugely disruptive. For example, Pebble’s record setting Kickstarter campaign accelerated their business and more fundamentally, defined the entire smart watch category.

In spite of this, I still meet entrepreneurs that resist the idea. Objections vary, but include:

  • Our target demographic does not line up with Kickstarter’s.
  • OUYA had a very successful campaign, but still failed. We don’t want to be associated with that.
  • It’s a lot of marketing work and distraction.
  • We’d rather just raise equity financing [and not have to ship all those orders].
  • We’ve launched products before; we know how to do this.

A presale is the marketing analogy to software testing: it tests product-market fit & demand before risking production investment. Of course, it’s not perfect: just like a “passed” test case is no guarantee a system works, a successful presale does not guarantee market success.  But a failure is extremely telling, and a presale (like software testing) can be a powerful tool to de-risk the journey.

The Right to Remember

Earlier this year, Mario Costeja-González won the right to be forgotten.  The Court of Justice of the EU ruled Google had to remove search results linking to a 1998 newspaper article about the foreclosure of his home (due to unpaid debts he later paid).  In the ultimate irony, he’s now permanently and widely remembered for precisely what he wanted everyone to forget (the Streisand Effect).

Now, search engines must consider requests from individuals to remove search results that:

appear to be inadequate, irrelevant or no longer relevant or excessive in the light of the time that had elapsed 

This raises the key question:  who judges this?  Something “irrelevant” to one person might be highly relevant to another.  Not surprisingly, Google is making its point by notifying Web sites when results are removed.

This decision raises fundamental questions about the right to inform & freedoms of speech and press.  The newspaper’s freedom to publish the foreclosure news is clearly protected, I am free to link to the news, and this blog post will eventually show up in search results.  It seems arbitrary that some have freedoms and some don’t.

For better or worse, search technology has permanently changed the privacy calculus.  Since the dawn of time we’ve enjoyed “practical obscurity“, where a lot of personal information was hard to identify, locate, or access. That’s changed, and legislators will now chase the issue with law and rulings in a never-ending game of Whac-a-Mole. For example, how long until someone finds ways to detect links that were removed and publishes them?

(Given this new world, a far better strategy for Mr. Costeja-González would be to generate new content and bury the foreclosure news in the noise.)

The Internet never forgets; plan accordingly.

From Felony Technique to iOS Feature

Last year, I shared a letter I sent to US Attorney Carmen Ortiz, AUSA Stephen Heymann, and others that prosecuted Aaron Swartz.  I felt (then and now) the government over-prosecuted this case, consuming significant prosecutorial & investigative resources and taking a negotiation stance way out of line with what Mr. Swartz actually did.

One of the key points in the government’s indictment was Mr. Swartz changing the MAC addresses on his laptop to avoid MIT’s attempts to block access.  (Since MIT’s network is completely open, MAC address tracking and blocking is the only real way to shut someone down, short of finding the physical device.)

And now, Apple has announced that MAC addresses in iOS 8 will be randomized, a user privacy feature to thwart tracking.  In other words, Apple has feature-ized the same technique Mr. Swartz used to avoid being tracked and blocked!  There’s a certain absurdity here I can’t quite express.

To be clear:  I’m not defending what Mr. Swartz did.  But this is one small example why this case has gotten so much attention.  When our prosecutors and law enforcement professionals don’t understand the technology (and don’t bring in experts that do) in complex cases, justice isn’t served.

If You’re Developing Any 3D Printing Tech, Don’t Buy A Stratasys Printer

I’ve written before about the risks of building on various Internet platform APIs (e.g. Facebook, Google, etc.) — many SDK agreements let the platform copy anything they want, while you have no recourse.

I just learned about a similar example in the hardware world. From Stratasys’s licensing agreement:

Customer hereby grants to Stratasys a fully paid-up, royalty-free, worldwide, non-exclusive, irrevocable, transferable right and license in, under, and to any patents and copyrights enforceable in any country, issued to, obtained by, developed by or acquired by Customer that are directed to 3D printing equipment, the use or functionality of 3D printing equipment, and/or compositions used or created during the functioning of 3D printing equipment (including any combination of resins, such as combinations relating to multi-resin mixing, color dithering or geometrical resin-mixture structure of the resin) that is developed using the Products and that incorporates, is derived from and/or improves upon the Intellectual Property and/or trade secrets of Stratasys. Such license shall also extend to Stratasys’ customers, licensors and other authorized users of Stratasys products in connection with their use of Stratasys products.

In simpler terms, if (a) you own a product subject to this license, and (b) invent something related to 3D printing, Stratasys and all of their customers have a right to use your invention without paying you.

(Technically, it says that your invention must incorporate, or be derived from, or improve upon their IP.  Given the breadth of their patents, they will argue anything in 3D printing meets this test.  They also include “trade secrets”, which are, well, secret.)

I’ve seen some audacious licensing agreements, but this one takes the cake!

Tech Cases at the Supreme Court

I thought two recent Supreme Court cases were especially interesting.

In Riley v California, the court ruled police need a warrant to search your cell phone. The court recognized that your phone contains so much information, it deserves Fourth Amendment protection.  Justice Alito made an interesting point though:  if you’re arrested carrying your phone bill and your cell phone, the police can use the call log information on bill.  But if you have just your phone, they need a warrant to get the call log.  But, he conceded he does “not see a workable alternative”.

ABC v Aereo, Inc. is less straightforward.  Aereo is the company that rents out tiny antennas in Manhattan, so users can stream broadcast TV on-line. The court found Aereo infringed the Copyright Act, which says that copyright holders have exclusive rights:

(1) to perform or display it at a place open to the public or at any place where a substantial number of persons outside of a normal circle of a family and its social acquaintances is gathered; or

(2) to transmit or otherwise communicate a performance or display of the work to a place specified by clause (1) or to the public, by means of any device or process, …..

(Emphasis added).  This “transmit clause” (2) was added in 1976 in response to cable TV, when Congress noted:

… the Committee believes that cable systems are commercial enterprises whose basic retransmission operations are based on the carriage of copyrighted program material and that copyright royalties should be paid by cable operators to the creators of such programs.

By using individual antennas, Aereo tried to thread the needle on the semantics of “public” and “perform” and most of the court’s opinion addresses that hair splitting.

This is a pattern we’ll see more and more:  companies using new technology to go right up to the edge of the law (nobody imagined private antennas in 1976) and courts splitting legal hairs to (try and) sort it all out.

Stratasys’s “Heated Build Enclosure” Patent

I gave a talk on patents at Bolt last month.  I covered the patent process and strategy for startups, but one of my key points was: don’t get too excited about a patent until you read the claims. The claims describe, very specifically, what the patent covers.

So it was interesting to hear several references to Stratasys’s 3D printer “heated build enclosure” patent recently.  (Background:  in certain 3D technologies, there’s less warping if printer chamber is toasty warm).  The 3D community refers to the “broad applicability” of this patent and is waiting for it to expire.

Intrigued, I studied the claims.  Here’s claim 1:

A three-dimensional modeling apparatus comprising a heated build chamber in which three-dimensional objects are built, a base located in the build chamber, a dispensing head for dispensing modeling material onto the base, the dispensing head having a modeling material dispensing outlet inside of the build chamber, and an x-y-z gantry coupled to the dispensing head and to the base for generating relative movement in three-dimensions between the dispensing head and the base, characterized in that:
the x-y-z gantry is located external to the build chamber and is separated from the chamber by a deformable thermal insulator.
(Emphasis added).  Note the gantry is external to the build chamber and all other claims specify this.  In fact, the specification highlights the disadvantages of putting the gantry inside:
Placing the extrusion head and the x-y-z gantry in this heated environment has many disadvantages. The x-y-z gantry is comprised of motion control components, such as motors, bearings, guide rods, belts and cables. Placing these motion control components inside the heated chamber minimizes the life of these components.
The implication is simple:  a chamber with the gantry inside or partially inside would not be “external” and would not infringe this patent.
Read the claims!