Thinking about Generative AI
In this post, I would give a brief introduction of recent popular generative AIs, and then dicsuss some of its pros and cons.
Introduction of recent popular models
In the past year, Generative AI (or AIGC) captured massive attention from all over the world. From Stable-Diffusion to various versions of Text-to-image models in the first half, and ChatGPT to Bing-GPT in the second half, there seems a new level of intelligence has been achieved.
Out of the spotlight, AlphaCode and CodeX has been used to boost programmer's productivity. AlphaTensor successfully discovered a new matrix multiply algorithm. After several years of development in deep learning field, Artificial Intelligence has made a remarkable step forward in multiple areas.
For more generative AI, please refer to Gozalo-Brizuela and
Garrido-Merchan's Work, in which they give an comprehensive overview
of different generative AIs and a brief introduction of various models.
The categorization based on input and output of models split the recent
research into 9 categories as follows:
- text-to-image
- text-to-3D model
- image-to-text (image summary)
- text-to-video
- text-to-audio (also called TTS)
- text-to-text (summary, comprehension, translation, etc..)
- text-to-code
- text-science
- Others
I am not going to give a detailed introduction to the technology behind each model, but discusses the pros and cons of some of the representative models and what to do with this irreversible trend.
Diffusion models - Artists
- Some example images diffusion model generated - obtained from Lexica.art, a wonderful image search engine for generated images
- In case you haven't try these awesome models, below are some
resources of ready-to-use models
- MidJourney Discord
Channel
- HuggingFace model
- MidJourney Discord
Channel
Pros
- Quick to draw.
- Compared to human artists, AI could generate an image in less than
10s.
- Compared to human artists, AI could generate an image in less than
10s.
- Adjustability.
- It may be hard to change the pose, the cloth or other details about
the figure, but for AI, changing a few tags could have this effect. One
could use
impaint
to erase some part of the picture and re-generate. Modifying a few tags could also change the style of the painting significantly.
- It may be hard to change the pose, the cloth or other details about
the figure, but for AI, changing a few tags could have this effect. One
could use
- Image-Image imitate capability.
- Take an image as input, generate a new image based on the given one and the prompt. This feature allows people to create personalized portrait for themselves.
Cons
- Prompt engineering needed.
- As the model name (text-to-image) suggests, one need to provide a description for the desired image. Though many products has encapsulated some prompt engineering work, it is still user's responsibility to understand the basic prompt structure and (know and )choose the tags they need for certain styles. Here are some guides for you to write efficient prompts:
- Serious copyright issues
- This topic is much debated, and most AI companies are vague about
rights of the generated images (correct me if I got this wrong). Laws or
industry rules are still unclear about to which extent of resemblance
should we consider an image is plagiarized. Some may debate that it is
plagiarism for using copyright protected images as the training
input!
- This topic is much debated, and most AI companies are vague about
rights of the generated images (correct me if I got this wrong). Laws or
industry rules are still unclear about to which extent of resemblance
should we consider an image is plagiarized. Some may debate that it is
plagiarism for using copyright protected images as the training
input!
- Unsure quality
- If you are using raw models, it is common to get poorly drawn hands,
fused fingers or more than 2 feet figures. The impact of different
sampling method, CFG scale (how strictly model adhere to prompt) and
sampling steps is significant. These things all add up to unsure quality
for unskilled user of diffusion models.
- A possible negative prompts I collected on the internet:
1
2
3# Negative prompts
(poorly drawn hands), (poorly drawn face), weird, (((fat))), ((cropped)), ((fused fingers)), ((too many fingers)), (malformed limbs), (((bad anatomy))), ((ugly)), out of frame, blurry, gross propotions, distorted face, distorted body, ((distorted fingers)), missing leg, more than 2 leg, more than 2 feet, more than 2 arms, text, ui, signature, icon, watermark, misplaced limbs, leg too big, leg too small, fused hands, fused arms, distorted backgroud, fused buildings, ((finger too short)), more than 1 right hand, more than 1 left hand, wrong direction of limbs, wrong direction of legs, wrong direction of feet, misplaced facial features, unbalanced facial features, ((body too long)), ((arm too short)), (poorly drawn joint), misplaced joint, wrong joint angle, disappearing legs, disappearing arms, disappearing limbs, ((fused limbs)), thumb too long, fingers too long, missing hands, missing arms, ((misplaced animal tail)),less than 2 ears, ((more than 2 ears)),(watermark)
- If you are using raw models, it is common to get poorly drawn hands,
fused fingers or more than 2 feet figures. The impact of different
sampling method, CFG scale (how strictly model adhere to prompt) and
sampling steps is significant. These things all add up to unsure quality
for unskilled user of diffusion models.
What to do against Visual Generations?
- With the rapid development of text to image models, what's the
unique advantages of humans that could hardly be replaced in the short
time (as for now)? How to cope with such situation?
- Embrace the change, use as an assistant / inspiration exploration
tool
- For expert artists, it is expected that AI generated art could
hardly replace them in the foreseeable future with their
consistent-quality creations and zero copyright risks. It is the rookies
in this field that AI could easily replace, outstanding through both
speed and quality.
- Moreover, the latest model is still based on probability theory,
which means AI still have a long way to go to actually have a human-like
sense of aesthetics. Like many novels and movies suggests, human's
creativity by combining different things together is
still outperforming artificial intelligence. Therefore, we could embrace
the development of AI models, use them to flourish relevant industries
(like game or movie making), to assist artists themselves to explore new
ideas, or sharing quick prototypes with clients to achieve accordance
about the requirements.
- Embrace the change, use as an assistant / inspiration exploration
tool
- Solve the Copyright issue!!!!
- It is never excessive to discuss this issue and build consensus between artists and AI developers. What rights could AI generated content have, how should those contents be judged as plagiarism and whether models are allowed to train on these images should be discussed throughly.
- Solve the Copyright issue!!!!
ChatGPT - Search Engine
- For the second half of the year, ChatGPT sweeps the attention of the public with its spectacular performance. Some even claims that ChatGPT could replace the search engine as a general AI assistant. (Now Bing has integrate with GPT-4 perhaps even better)
Pros
- Fantastic comprehension of instructions and obeying them.
- As many may have tried, ChatGPT could understand instructions
ranging from writing a poem to write a simple program. With prompt
engineering (yeah, again), you could even change the ChatGPT into a command
line interface or a cute
chatbot that only reply to you using emoticons.
- As many may have tried, ChatGPT could understand instructions
ranging from writing a poem to write a simple program. With prompt
engineering (yeah, again), you could even change the ChatGPT into a command
line interface or a cute
chatbot that only reply to you using emoticons.
- Long context support
- As a conversational AI, the ability to have long-range context
support is essential. ChatGPT is the most successful product that
implements context support among all large language models I know.
- As a conversational AI, the ability to have long-range context
support is essential. ChatGPT is the most successful product that
implements context support among all large language models I know.
- Ability to fuse content for increasing productivity(compared to
search engines to some extent)
- This ability kinds of automate the search engine's job. When we have
a question, we type it in search engine, read through the first few
articles and deduce the result ourselves. ChatGPT has the capability of
simplifying the latter process, by performing passage summarization and
comprehension task to generate an answer.
- It also empowers copywriters who could combine their companies
product with various marketing templates with a few lines of
instructions.
- With code writing integration, ChatGPT also empowers programmers to write code more efficiently with code templates, package suggestion and debug hints.
- This ability kinds of automate the search engine's job. When we have
a question, we type it in search engine, read through the first few
articles and deduce the result ourselves. ChatGPT has the capability of
simplifying the latter process, by performing passage summarization and
comprehension task to generate an answer.
Cons
- Bad support for factual questions on specific domains
- As a natural defect of probabilistic model, questions based on
actual fact on specific domains may be wrong. For example, if you query
about a specific domain in scientific research, ChatGPT may concat
incorrect author with papers, or even made up one.
- As a natural defect of probabilistic model, questions based on
actual fact on specific domains may be wrong. For example, if you query
about a specific domain in scientific research, ChatGPT may concat
incorrect author with papers, or even made up one.
- Lack of in-depth User comprehension
- Though it is not OpenAI's responsibility to build this module, one
could hardly deny that a well-designed chatbot could perform better than
ChatGPT in understanding the customer and react accordingly, which
suggest the need of further tuning for customer-oriented industry like
companion chatbot.
- Though it is not OpenAI's responsibility to build this module, one
could hardly deny that a well-designed chatbot could perform better than
ChatGPT in understanding the customer and react accordingly, which
suggest the need of further tuning for customer-oriented industry like
companion chatbot.
- Cost of Training / Finetuning
- This may not be a problem for most down-stream companies using APIs to access the language model, however, this may result in monopoly in the large language model area. The cost of training would then be transferred to down-stream users of models.
What to do?
- Against LLMs' weaknesses
- explainability study of LLM
- Poor explainability has been a problem since large language model
became the new paradigm. If we could understand how attention mechanism
understand certain input, it could a huge improvement to mastering the
black box model.
- Poor explainability has been a problem since large language model
became the new paradigm. If we could understand how attention mechanism
understand certain input, it could a huge improvement to mastering the
black box model.
- Factual response research
- As ChatGPT has some defect in factual response generation, it may be valuable to study how to combine knowledge with the generation model. A promising direction is to use knowledge graphs (my introduction to KGs), which explicitly defines knowledge with clear structure. Knowledge Graph based Question Answering (KGQA) seems prospective in generating controllable and correct answers.
- explainability study of LLM
- What could comapnies do to build competitive products?
- Ability to understand the user
- This includes the recognition of the user's current mood
- Understanding of the person from past interaction data collection
- Reasoning about the user profile after big data analysis
- Content expertise and richness in verticals (psychological
counseling, healing as an example)
- Multiple rounds of dialogues for different life scenarios
- Development of common counseling scenarios
- R&D of generative models based on consulting corpus (to be launched)
- Customization support for specific business
- ChatGPT as a large model, the deployment cost and continued training cost are not small. From this perspective, we can sell customized small model services, segmentation, and compete with ChatGPT in the niche area of expertise.
- Based on the optimized knowledge base, use different knowledge bases to support different roles of chatbots and adapt to different scenarios.
- Ability to understand the user