Thinking about Generative AI

In this post, I would give a brief introduction of recent popular generative AIs, and then dicsuss some of its pros and cons.

Introduction of recent popular models

In the past year, Generative AI (or AIGC) captured massive attention from all over the world. From Stable-Diffusion to various versions of Text-to-image models in the first half, and ChatGPT to Bing-GPT in the second half, there seems a new level of intelligence has been achieved.

Out of the spotlight, AlphaCode and CodeX has been used to boost programmer's productivity. AlphaTensor successfully discovered a new matrix multiply algorithm. After several years of development in deep learning field, Artificial Intelligence has made a remarkable step forward in multiple areas.

For more generative AI, please refer to Gozalo-Brizuela and Garrido-Merchan's Work, in which they give an comprehensive overview of different generative AIs and a brief introduction of various models. The categorization based on input and output of models split the recent research into 9 categories as follows:
- text-to-image
- text-to-3D model
- image-to-text (image summary)
- text-to-video
- text-to-audio (also called TTS)
- text-to-text (summary, comprehension, translation, etc..)
- text-to-code
- text-science
- Others

I am not going to give a detailed introduction to the technology behind each model, but discusses the pros and cons of some of the representative models and what to do with this irreversible trend.

Diffusion models - Artists

  • Some example images diffusion model generated - obtained from Lexica.art, a wonderful image search engine for generated images

Example-diffusion-figure

example-3d-art

example-realistic

Pros

  • Quick to draw.
    • Compared to human artists, AI could generate an image in less than 10s.
  • Adjustability.
    • It may be hard to change the pose, the cloth or other details about the figure, but for AI, changing a few tags could have this effect. One could use impaint to erase some part of the picture and re-generate. Modifying a few tags could also change the style of the painting significantly.
  • Image-Image imitate capability.
    • Take an image as input, generate a new image based on the given one and the prompt. This feature allows people to create personalized portrait for themselves.

Cons

  • Prompt engineering needed.
  • Serious copyright issues
    • This topic is much debated, and most AI companies are vague about rights of the generated images (correct me if I got this wrong). Laws or industry rules are still unclear about to which extent of resemblance should we consider an image is plagiarized. Some may debate that it is plagiarism for using copyright protected images as the training input!
  • Unsure quality
    • If you are using raw models, it is common to get poorly drawn hands, fused fingers or more than 2 feet figures. The impact of different sampling method, CFG scale (how strictly model adhere to prompt) and sampling steps is significant. These things all add up to unsure quality for unskilled user of diffusion models.
    • A possible negative prompts I collected on the internet:
      1
      2
      3
      # Negative prompts

      (poorly drawn hands), (poorly drawn face), weird, (((fat))), ((cropped)), ((fused fingers)), ((too many fingers)), (malformed limbs), (((bad anatomy))), ((ugly)), out of frame, blurry, gross propotions, distorted face, distorted body, ((distorted fingers)), missing leg, more than 2 leg, more than 2 feet, more than 2 arms, text, ui, signature, icon, watermark, misplaced limbs, leg too big, leg too small, fused hands, fused arms, distorted backgroud, fused buildings, ((finger too short)), more than 1 right hand, more than 1 left hand, wrong direction of limbs, wrong direction of legs, wrong direction of feet, misplaced facial features, unbalanced facial features, ((body too long)), ((arm too short)), (poorly drawn joint), misplaced joint, wrong joint angle, disappearing legs, disappearing arms, disappearing limbs, ((fused limbs)), thumb too long, fingers too long, missing hands, missing arms, ((misplaced animal tail)),less than 2 ears, ((more than 2 ears)),(watermark)

What to do against Visual Generations?

  • With the rapid development of text to image models, what's the unique advantages of humans that could hardly be replaced in the short time (as for now)? How to cope with such situation?
    1. Embrace the change, use as an assistant / inspiration exploration tool
    • For expert artists, it is expected that AI generated art could hardly replace them in the foreseeable future with their consistent-quality creations and zero copyright risks. It is the rookies in this field that AI could easily replace, outstanding through both speed and quality.
    • Moreover, the latest model is still based on probability theory, which means AI still have a long way to go to actually have a human-like sense of aesthetics. Like many novels and movies suggests, human's creativity by combining different things together is still outperforming artificial intelligence. Therefore, we could embrace the development of AI models, use them to flourish relevant industries (like game or movie making), to assist artists themselves to explore new ideas, or sharing quick prototypes with clients to achieve accordance about the requirements.
    1. Solve the Copyright issue!!!!
    • It is never excessive to discuss this issue and build consensus between artists and AI developers. What rights could AI generated content have, how should those contents be judged as plagiarism and whether models are allowed to train on these images should be discussed throughly.

ChatGPT - Search Engine

  • For the second half of the year, ChatGPT sweeps the attention of the public with its spectacular performance. Some even claims that ChatGPT could replace the search engine as a general AI assistant. (Now Bing has integrate with GPT-4 perhaps even better)

Pros

  • Fantastic comprehension of instructions and obeying them.
    • As many may have tried, ChatGPT could understand instructions ranging from writing a poem to write a simple program. With prompt engineering (yeah, again), you could even change the ChatGPT into a command line interface or a cute chatbot that only reply to you using emoticons.
  • Long context support
    • As a conversational AI, the ability to have long-range context support is essential. ChatGPT is the most successful product that implements context support among all large language models I know.
  • Ability to fuse content for increasing productivity(compared to search engines to some extent)
    • This ability kinds of automate the search engine's job. When we have a question, we type it in search engine, read through the first few articles and deduce the result ourselves. ChatGPT has the capability of simplifying the latter process, by performing passage summarization and comprehension task to generate an answer.
    • It also empowers copywriters who could combine their companies product with various marketing templates with a few lines of instructions.
    • With code writing integration, ChatGPT also empowers programmers to write code more efficiently with code templates, package suggestion and debug hints.

Cons

  • Bad support for factual questions on specific domains
    • As a natural defect of probabilistic model, questions based on actual fact on specific domains may be wrong. For example, if you query about a specific domain in scientific research, ChatGPT may concat incorrect author with papers, or even made up one.
  • Lack of in-depth User comprehension
    • Though it is not OpenAI's responsibility to build this module, one could hardly deny that a well-designed chatbot could perform better than ChatGPT in understanding the customer and react accordingly, which suggest the need of further tuning for customer-oriented industry like companion chatbot.
  • Cost of Training / Finetuning
    • This may not be a problem for most down-stream companies using APIs to access the language model, however, this may result in monopoly in the large language model area. The cost of training would then be transferred to down-stream users of models.

What to do?

  • Against LLMs' weaknesses
    • explainability study of LLM
      • Poor explainability has been a problem since large language model became the new paradigm. If we could understand how attention mechanism understand certain input, it could a huge improvement to mastering the black box model.
    • Factual response research
      • As ChatGPT has some defect in factual response generation, it may be valuable to study how to combine knowledge with the generation model. A promising direction is to use knowledge graphs (my introduction to KGs), which explicitly defines knowledge with clear structure. Knowledge Graph based Question Answering (KGQA) seems prospective in generating controllable and correct answers.
  • What could comapnies do to build competitive products?
    • Ability to understand the user
      1. This includes the recognition of the user's current mood
      2. Understanding of the person from past interaction data collection
      3. Reasoning about the user profile after big data analysis
    • Content expertise and richness in verticals (psychological counseling, healing as an example)
      1. Multiple rounds of dialogues for different life scenarios
      2. Development of common counseling scenarios
      3. R&D of generative models based on consulting corpus (to be launched)
    • Customization support for specific business
      1. ChatGPT as a large model, the deployment cost and continued training cost are not small. From this perspective, we can sell customized small model services, segmentation, and compete with ChatGPT in the niche area of expertise.
      2. Based on the optimized knowledge base, use different knowledge bases to support different roles of chatbots and adapt to different scenarios.

Thinking about Generative AI
https://delusion4013.github.io/2023/02/14/Thinking-about-Generative-AI/
Author
Chenkai
Posted on
February 14, 2023
Licensed under