← Back to Commonplace

Link: Copilot investigation

The new hotness in tech seems to be training AI on datasets without the owner/subject’s knowledge or consent and using that to produce features or products to sell. See Clearview’s scraping of faces for facial recognition tech, AI art generators scraping artists’ works and more (not unrelated: Cambridge Analytica and the whole surveillance ads market).

AI advocates say this content is public and therefore fair game – “artists use everything they’ve seen to inform their work”.

However an artist’s limited, personal and somewhat curated experiences eventually produce a unique style after years of honing their craft. Companies training AI on every image that’s ever been digitised – without a single copyright owner’s consent – with the aim of selling access to their dataset (or getting a $bn exit) is another prospect altogether. The difference in scale is key.

Another example of this is GitHub’s Copilot which is essentially an intelligent code autocomplete. The underlying dataset was trained on open source repositories without the owner’s permission.

Now a group of lawyers are investigating to see if there’s potential for a lawsuit. One to watch.