While the API is in closed beta, you can try out a version of it on AI Dungeon. AI Dungeon is geared at text adventure games but can be used for anything. I tried it out by asking Turing-test-style questions to see how well it "understands" the world. Initially I was not so impressed as I posted to Reddit. However there were a few things I missed:
- AI Dungeon uses by default a hybrid between GPT-2 and GPT-3. Once I updated it to use the "dragon" model (and reduced the randomness), it switched to full GPT-3 and the results improved.
- For some questions, GPT-3 needs to be primed in that category before being asked a question. For example, it can solve simple math questions if it's given a few examples first. I think this is impressive since in the future it may be able to answer the questions without the priming and meanwhile specific applications can fine-tune it.
- A commenter on Reddit reported using the actual API and getting a better result for one or two of the questions. It's possible that AI Dungeon primes GPT-3 to answer prompts in a certain style and this causes it to do more poorly on general questions.
With the right settings in place GPT-3 is able to answer a ride range of questions, including questions involving ambiguous grammar. For example here are my questions and its answers:
Q: Arjun lived in an Indian jungle. One night he heard a noise, went outside and shot a bear in his pajamas. Where was the bear?
A: The bear was in the jungle.
Q: Arjun lived in an Indian jungle. One night he woke up and squished a spider in his pajamas. Where was the spider?
A: The spider was in his pajamas.
I can see GPT-3 being useful in many applications from search engines to online tutorials. It can also be used for less noble purposes such as advanced spamming and trolling, but OpenAI plans to control usage of it carefully.
GPT-3 went from being basically ignored to over-the-top hype very quickly. While it's certainly impressive how much it can do, it'a quite far from a general AI that can do anything. It's still just a text predictor without a fundamental understanding of the world. For example, while it can answer comparison questions within a specific category, it struggles with cross-category comparisons, even after being primed for them. I asked it the following question after having provided it with similar questions and answers:
Q: Which of these is largest? Planet Venus or an elephant?It answered:
A: an elephantI tried a few more similar questions but it consistently got them wrong. I assume the elephant wins most size comparisons on the internet, but it's still smaller than a planet. Assuming the standard API doesn't do better, this seems to be a large blindspot with GPT-3. Text prediction can answer many questions, but sometimes better knowledge is needed. It would be interesting if they can find a way to combine GPT-3's text prediction with a structured knowledge algorithm. In the meantime, humans are still needed. ~