April AI SIG Notes

  • Show & Tell
  • Ethics and Privacy Issues
  • Accuracy
      • Large language models deal in probability, not truth.  Outputs must be checked for accuracy.
      • Ask the AI to cite sources – and check that they are real
  • Training Data
  • Large language models are only as good as training data.
  • Biased data leads to biased outputs: article
    • Did the company have rights to use training data?
      • Google – uses search, web, etc
      • Facebook – uses posts on FB.  Also used a database of pirated books: article
      • Beginning to sign contracts: article
  • Confidential Data
    • When you go to, say, chatgpt.com, everything you input may get added to training data, eventually.  Don’t include confidential or sensitive information.  (AI is actually good at helping to anonymize data, such as generating lists of names to substitute for the real names.)Assume everything that you type on the public internet is added to training data.New downloadable models run on your personal device.  Your interactions are not necessarily uploaded as training data. 
    • Check the terms of use!
  • Copyright, Trademarks & Privacy
    • Purely AI-generated content cannot be copyrighted in the US. AI makes it really easy to generate illegal derivatives of copyrighted content.
    • May you use AI to create images using Coke cans, Lady Gaga’s face  or graphic violence?
      • Most products have restrictions and guidelines.  Grok (the X product) does not.
  • Authorship
    • When and how do we disclose the assistance of AI?  If it is being used as a tool in a way similar to a dictionary or word processor – ie, as a tool whose results get filtered through a human brain – probably not.  If it is doing unaudited work, probably.
    • How to make sure future AI products can be distinguished from “original” products… such as images, letters?
  • Environmental Issues
    • Running a new machine learning model on a huge training dataset, and offering it to millions of users, requires huge amounts of electricity and water.  Article
  • Resources
    • The Family History AI Show, by Mark Thompson & Steve LittleGenealogy and Artificial Intelligence Facebook Group
    • University of South Florida has written a very useful LibGuide on using AI: website
  • Articles Recommended by ChapGPT

Accuracy: Synthetic Heritage: Online platforms, deceptive genealogy and the ethics of algorithmically generated memory

Bias:  Ethical Challenges and Solutions of Generative AI

Respect for the Deceased:  AI Developments in Genealogy and How They Impact You

Copyright:  Generative AI in Focus: Copyright Office’s Latest Report

Terms of Use:  Navigating the Legal Landscape: Generative AI and Copyright Law

Data Sharing:  Privacy Challenges and Research Opportunities for Genomic Data

Living Relatives:  Ethical Considerations: Navigating Privacy and AI in Family History

Citation/Disclosure:  Disclosing Use of AI for Writing Assistance in Genealogy

Ethical Research: AI Developments in Genealogy and How They Impact You