Evaluating the Need for Jailbreaking ChatGPT

Evaluating the Need for Jailbreaking ChatGPT

Frank Lv13

The Battle of Brains: Evaluating the Features and Capabilities Between Claude AI and ChatGPT

Since its release in November 2022, ChatGPT has remained the dominant force in the AI chatbot space. Despite far-reaching efforts by several AI companies, no one has really been able to build a chatbot that truly challenges ChatGPT in overall response quality. Google’s Bard? Microsoft’s Bing AI? No, not really.

However, Claude AI, a chatbot built by AI startup Anthropic, shows qualities of a chatbot that can dethrone ChatGPT. A considerable number of users are already saying Claude is the better option. But is this the case? Let’s take both chatbots for a spin.

Disclaimer: This post includes affiliate links

If you click on a link and make a purchase, I may receive a commission at no extra cost to you.

ChatGPT vs. Claude AI: Common-Sense and Logical Reasoning

There’s an intriguing contrast when working with AI chatbots. On one hand, they can whiz through complex tasks that humans may labor over for days to solve. On the other hand, they sometimes grapple with elementary problems that require just a bit of common-sense or logical reasoning. So, we tested both ChatGPT and Claude AI to see which AI chatbot was better at common sense and logical reasoning tasks.

logical and commonsense problem

ChatGPT broke up the problem into bits and solved it on the first attempt. Claude AI also had a go at it and solved the problem as well, but with a different approach.

Claude AI solving a commonsense and logical reasoning problem

For the first task, both chatbots were able to crack the problem. So, we moved on to a different kind of problem. We tasked both chatbots with answering a trick question.

ChatGPT Answers Trick Question-1

ChatGPT was able to immediately spot the trick–you can’t bury survivors because they aren’t dead. Claude AI, on the other hand, seemed to understand that it was a trick question but failed to spot the most common-sense issue that you don’t bury survivors.

Instead, it over-analyzed the question and came to the conclusion that there would be “no survivors to bury” because crashing from Mars to Earth would be fatal. It is not the answer we expected, but if you look at things from a different angle, there is some truth to it.

Claude AI answers trick question

On this task, we give it to ChatGPT, but we can’t totally rule out Claude AI’s approach. For our final task on this metric, we asked both chatbots how many apples would be left on an apple tree after five and 10 days respectively if we started with 10 apples and five of them got sliced while still on the tree. ChatGPT said there’d still be 10 apples left.

ChatGPT birds commonsense logic

Claude AI, on the other hand, gave a more common-sense response by recognizing that the five sliced apples are likely to rot.

Claude AI Common sense reasoning with Apple rotting

Claude AI clearly got this one. We tried a few more tricky problems, and both chatbots had a fair share of successes and failures in dealing with them. Considering the outcome we observed, it might be fair to say that while ChatGPT has an edge, both chatbots are not too far apart in common sense and logical reasoning abilities.

ChatGPT vs. Claude AI: Math Skills

Even if you never plan to use ChatGPT or Claude AI to solve your Algebra homework, their mathematical abilities have far-reaching implications. For AI chatbots, math is the key to understanding real-world logic, identifying flawed thinking, and admitting mistakes.

Essentially, math proficiency is a core metric of artificial intelligence. So, between ChatGPT and Claude AI, which chatbot is more proficient in math? We tasked both chatbots with solving a twisty math productivity problem. We started with Claude AI, and the chatbot cracked the problem.

Claude AI solves maths problem on productivity

ChatGPT also cracked the problem as well.

ChatGPT solves maths problem on productivity

Moving on, we asked both chatbots to solve8/a-1 = 20/3a-1 , a fairly straightforward math problem with a surprisingly high failure rate among AI chatbots. ChatGPT was able to solve it, providing a correct answer of-3 at the first attempt.

ChatGPT solves a math problem

Claude AI failed at the first attempt, but when we prompted it to solve the problem step by step (which forces it to think through every step of its logic) it was able to crack it.

Claude AI solves a math problem step-by-step

We tried a few more math problems. While both chatbots got it right on the first try in some cases, in several instances, Claude AI needed a second or third attempt to provide the right response. In terms of math skills, we’ll give the crown to ChatGPT.

ChatGPT vs. Claude AI: Creativity

One of Claude AI’s biggest hype is its creative abilities. But can it match ChatGPT’s creativity? Or, could it possibly surpass ChatGPT? To put both chatbots to the test, we tasked them with writing lyrics for a rap song that rhymes.

We chose a rhyming rap test because it is something a lot of language models struggle with. Most models will typically not get the rhyming right or get the rhyming right while the lyrics itself doesn’t make sense. To make things more interesting, the rap song will be about growing cucumbers.

So, we asked both ChatGPT and Claude AI to “write a rhyming rap about growing cucumbers as a farmer and becoming a millionaire from it.” ChatGPT went first, and as expected, it produced some exciting lyrics.

ChatGPT composes a rap lyrics

We then fed the same prompt to Claude AI, and it gave it a fair shot as well.

Claude AI composes a rap lyrics

Both lyrics are good, but ChatGPT seemed to have an edge here. It had better rhyming, and we had the result we needed on the first trial. We had to try three times before Claude AI could produce lyrics that rhymed. We’ll give this one to ChatGPT.

After trying out a few more creative tasks, Claude AI seemed to excel in writing-related tasks and was able to write more natural-sounding content like a human writer would do. AlthoughChatGPT was better at overcoming more complex creative tasks , it sometimes couldn’t shake off that AI chatbot feeling in the text it generated. Our verdict? Both ChatGPT and Claude AI are creative in their own right.

ChatGPT vs. Claude AI: Coding Skills

Just like math skills, coding skills are another very important metric for judging the abilities of an AI chatbot. While the majority of users will probably neveruse a chatbot for coding , there are significant underlying implications for a chatbot’s abilities to write and understand code proficiently.

While chatbots are currently sophisticated, they are far from what they could actually become if and when they’re able to write code proficiently. For AI chatbots to truly evolve into powerful AI assistants that can do more than generate text, they need to be able to write code that solves problems on demand. We’ve previously discussed how important coding skills are to AI chatbots in ourChatGPT Code Interpreter explainer.

​​​​​​That said, we put both chatbots on two coding tasks. We asked ChatGPT and Claude AI to write functional code for a to-do list app. Starting with ChatGPT, the AI chatbot was able to deliver a functional to-do list app on the first attempt. We copy-pasted and ran it on a browser, and it worked perfectly without errors. Here’s the output on a browser.

to-do list app by ChatGPT

Moving on to Claude AI, the chatbot wrote clearly intelligible code. The structure and logic all seemed fine. Unfortunately, despite repeated attempts, Claude AI kept missing some critical logic to make the code actually run on a browser. It’s a fail on this one.

After Claude AI failed the last test, we tried a different kind of coding task, one that was more about analyzing code and less about writing new code. We uploaded five PHP files that represent the complete backend for a website and asked both Claude AI and ChatGPT where we would need to edit in all the uploaded files to ensure we get a mail once a new user registers on the site.

Claude AI analyzing multiple PHP files

Surprisingly, ChatGPT, despite seemingly having superior coding skills, failed at this despite repeated attempts. Claude AI, on the other hand, was able to analyze the code proficiently while identifying the right places that needed to be edited to achieve the desired results.

Of course, this was not an isolated case, we repeated it with several other code files, but ChatGPT stumbled and stalled on the majority of cases while Claude AI kept delivering impressive results. In terms of coding skills, the winner is not entirely straightforward.

ChatGPT is clearly significantly better at writing new code and can manage complex code with impressive proficiency. However, Claude AI is significantly better at analyzing large code bases. So, if you’re looking to write code for some new idea you have, ChatGPT is the tool to turn to. If you want to analyze or make sense of a code base with thousands of lines across several files, then we would definitely recommend Claude AI.

Claude AI Is a Potent Competitor on the Block

Claude AI represents a potent competition for ChatGPT–one that can compete with and potentially surpass ChatGPT someday. Given Claude is a relatively new AI model, it is enviable that it can take on ChatGPT the way it currently does. Claude AI’s emergence and the quality it offers proof that the competition is heating up.

Also read:

  • Title: Evaluating the Need for Jailbreaking ChatGPT
  • Author: Frank
  • Created at : 2024-12-21 23:37:03
  • Updated at : 2024-12-29 05:23:41
  • Link: https://tech-revival.techidaily.com/evaluating-the-need-for-jailbreaking-chatgpt/
  • License: This work is licensed under CC BY-NC-SA 4.0.