Microsoft Copilot Chatbot Assessment: Bing Is My Default Search Engine Now

Execs
- Makes use of GPT-4 and GPT-4 Turbo
- Free
- Precisely hyperlinks to related info
- Contains emojis and photos in responses
Cons
- Whereas prettier, not as cleanly organized as ChatGPT and Claude
- Leaping between totally different modes requires a completely new search
- Can keep away from making definitive statements
- Refuses to reply prompts deemed controversial
Primary information:
- Value: Free
- Availability: Net, Home windows 11 or cell app
- Options: Voice recognition, connection to open web and Bing, capability to tune solutions to both extra inventive or exact
- Picture technology: Sure
For Microsoft search engineers, there’s most likely no larger reward than telling them you have switched your default search engine from Google to Bing. Certain, it took a multibillion-dollar funding from Microsoft to combine OpenAI’s GPT-4 tech into its engine. However when Bing is working at 3.3% global market share, in comparison with Google’s 91.6%, drastic measures need to be taken.
The factor is, I am probably not utilizing Bing. I am really utilizing Copilot, Microsoft’s renamed AI chatbot that is part of Bing.
What makes Copilot distinctive is that it is basically three GPT engines in a single. Copilot has three modes: balanced, exact and artistic. As of this evaluation, the balanced and exact modes are utilizing GPT-4, a mannequin by OpenAI, creator of ChatGPT, that reportedly has over 1 trillion parameters. That is considerably greater than ChatGPT 3.5, which has 175 billion. Inventive, nevertheless, is utilizing GPT-4 Turbo, which makes use of knowledge up till April 2023, versus September 2021 in GPT-4. It might probably additionally give considerably bigger responses, the equal of 300 pages of textual content. It is unsure when Microsoft will deliver the ability of GPT-4 Turbo to Copilot’s balanced and exact modes.
Copilot is the very best of each ChatGPT and Google’s Gemini. It has the accuracy and fine-tuning of ChatGPT with the web connectivity discovered with Gemini. Which means that solutions learn extra like a human and it may well pull up-to-date info from the web. Actually, Copilot delivers such good outcomes it is a marvel why Microsoft is not charging for it.
Whereas Copilot can generate pictures, we cannot be testing that function for the needs of this evaluation.
How CNET exams AI chatbots
CNET takes a sensible method to reviewing AI chatbots. Our objective is to find out how good it’s relative to the competitors and which functions it serves finest. To do this, we give the AI prompts primarily based on real-world use instances, akin to discovering and modifying recipes, researching journey or writing emails. We rating the chatbots on a 10-point scale that considers components akin to accuracy, creativity of responses, variety of hallucinations and response velocity. See how we check AI for extra.
Do word that Microsoft does collect data when using Copilot, and this consists of Copilot integrations in Phrase, PowerPoint, Excel, OneNote, Loop and Whiteboard.
Buying
As a scorching sauce aficionado, I have been following the recent drama surrounding Huy Fong Foods, the purveyors of the long-lasting purple sriracha sauce, and the way the flavour has modified since its hiatus and up to date return. Seems, there’s been an ongoing dispute with its authentic jalapeño provider and Huy Fong Meals now sources chilis from Mexico. So as to add one other wrinkle on this saga, Underwood Ranches, the unique jalapeño provider, has entered the market with its personal sriracha sauce.
I requested Copilot if it may assist describe the variations I ought to count on between the brand new sriracha from Huy Fong and the copycat from Underwood Ranches. Copilot excelled in giving a full breakdown with particular language and even gave a fast abstract of the continued company drama.
Copilot described Huy Fong’s sriracha as extra garlicky, with sweeter notes and fewer spice than earlier than, whereas Underwood Ranches has added kick and is extra paying homage to the outdated sriracha. This description fell in keeping with different testimonies I’ve seen on YouTube and Reddit.
Not like Gemini and ChatGPT 3.5, Copilot gave particular descriptors and laid the knowledge out in a fashion that was simpler to comply with.
Past sriracha sauces, I’ve additionally been out there for a brand new TV. In evaluating final yr’s LG OLED C3 and G3 fashions, Copilot did an excellent job breaking down the variations and explaining which one could be the higher purchase. It received the important thing particulars proper, like the truth that each televisions use the identical processor and that the G3 will get brighter. Nonetheless, it did not make the sorts of definitive arguments that Gemini did when prompted with the identical query.
However after I requested the identical query in Copilot’s “inventive” mode, which makes use of GPT-4 Turbo, it offered solutions that felt extra thought out, relatively than a string of boilerplate bullet factors. Right here, Copilot put collectively cogent ideas on brightness, design and efficiency, with a concluding paragraph explaining that, for most individuals, the elevated brightness will not be noticeable on the costlier G3.
Copilot in “inventive” mode felt most like Claude. Data was higher synthesized and did really feel prefer it was put collectively by an actual individual. Gemini and Perplexity carried out equally, with sharp descriptions and little fence-sitting. Whereas all of the AI chatbots carried out properly, I might have to provide the sting to Copilot and Claude.
ChatGPT 3.5 at the moment cannot make some of these purchasing comparisons, as its coaching knowledge is simply as much as September 2021.
Recipes
Typically discovering an excellent recipe on-line could be a chore. Common dishes can fluctuate wildly, making it troublesome to search out the very best one. Plus, having to scroll by way of long-winded preambles about memorable flavors of yore can get tiresome. An AI can filter by way of all of the fluff and generate recipes instantly.
Copilot did a good job of producing a rooster tikka recipe in inventive mode. It received the fundamental substances down, in addition to an inventory of directions on find out how to put together the combination. Nonetheless, it ignored harder-to-find substances, ones that Gemini did seize, like Kashmiri chili powder, chaat masala and amchur, a dried mango powder.
I used to be curious what reply Copilot would yield if switching to express mode. Curiously, it included mustard powder, which is not as widespread, and kasuri meti, or dried fenugreek.
Given Copilot’s trifurcated nature, you may must weigh which mode inside Copilot may yield the very best reply. Simply because inventive makes use of GPT-4 Turbo does not imply it’s going to give the very best outcome to all queries.
Total, Google Gemini carried out finest on this check, offering essentially the most sturdy recipe. This was adopted by Copilot in exact mode. ChatGPT 3.5, Perplexity and Claude all carried out equally, with very primary recipes.
Analysis
The facility of AI in doing analysis is that the mannequin can take a look at a number of items of data and assist discover linking factors in seconds. Usually, this is able to require you having to learn by way of analysis papers your self to make these types of connections. Copilot not solely does this properly, however hyperlinks to sources, too.
Copilot will get glorious marks as a analysis instrument. After I requested Copilot concerning the relationship between homeschooling and neuroplasticity, it pulled up analysis papers associated to childhood training and mind improvement, and it even linked on to PDF information containing the analysis.
I then switched to inventive mode and received a good higher response, with Copilot synthesizing further sources and giving extra nuanced solutions. It felt as if Copilot had a higher understanding of the subject and the complexities totally different education environments current.
Copilot in inventive mode and Claude carried out equally on this check, and beat out Gemini, ChatGPT 3.5 and Perplexity. And in contrast to Gemini, all of Copilot’s responses have been actual. It did not make up the names of analysis papers in the best way that Gemini did.
Whereas ChatGPT 3.5 was additionally correct in recommending and summarizing analysis papers, it is not related to the open web, so it may well solely suggest you go to Google and seek for it your self.
Summarizing articles
Copilot does a good job of summarizing articles, however like all the opposite AI chatbots we have examined, they regularly fail to seize the central focus.
Copilot, like Gemini, ChatGPT 3.5, Perplexity and Claude, have been in a position to seize the fundamental factors of an article I wrote earlier this yr about AI at CES 2024. However all gave the impression to be unable to pinpoint the most important crux of the piece: That a number of AI hype is a rebranding of older good tech.
Can Copilot provide you with an excellent rundown of an article in a pinch? Certain. Must you depend on article summaries for a category presentation? Most likely not.
Journey
The web is glutted with journey suggestions. From blogs, journey information publishers, TikTokers and YouTubers, so many individuals are attempting to fill you in on the very best websites and eats in iconic cities like Paris or London. However what about Columbus, Ohio? That is the place AI can come into play with its capability to glean knowledge from throughout the online and synthesize details about lesser traveled places.
After I requested Copilot for a three-day journey itinerary to Columbus, it carried out spectacularly properly in placing collectively suggestions for places and eating places in a bullet-pointed, easy-to-understand format. We cross-referenced Copilot’s outcomes with CNET’s Bella Czajkowski, who hails from Cowtown. Copilot additionally did an incredible job weaving in bonus suggestions, one thing ChatGPT 3.5 and Gemini uncared for to do.
All of the eating places Copilot really useful have been actual. It did not make up eating places like Google Gemini did. And I’ve handy it to the Microsoft crew for coding Copilot to additionally bake emoji into responses. It provides that slight trace of persona and makes following a prolonged set of journey suggestions simpler to comply with. For instance, if you wish to pinpoint the bar recs, search for the beer emoji.
In comparison with the AI bots examined, Copilot outperformed all of them. Copilot made suggestions to locales and eating places, all of which exist and are nonetheless open, producing articulate and correct outcomes with easy-to-follow language and construction. ChatGPT carried out adequately, regardless of it not being related to the open web.
Writing emails
Like each different chatbot examined, Copilot performs nice in writing primary emails. You’ll be able to simply ask Copilot to tune an e mail to be roughly formal. Whatever the tone you go together with, emails learn as plausible.
When asking Copilot to create an article pitch on racier matters, nevertheless, just like the elevated sexualization of on-line content material creators and the continued modifications in parasocial relationships with followers throughout the web, Microsoft’s AI engine refused to interact in discussions about express content material or the ethical and moral qualms associated to it.
All the opposite AI chatbots have been in a position to tackle this activity. Claude carried out the very best, making a pitch that was compelling and written properly sufficient to be handed off as human-made.
Higher than ChatGPT, Gemini or Perplexity
Copilot is flexible and might generate responses to be inventive or exact, one thing the opposite AI chatbots cannot do except prompted to. The way in which Copilot presents info, usually with bullet factors and emojis, makes it straightforward to learn. It is also correct, linking to precise items of stories and knowledge and confirmed no cases of hallucinations, at the least in our testing.
Whereas Copilot does not have Claude’s persona, it normally performs at or past it, given the duty. Microsoft, nevertheless, has seemingly put excessive guardrails on Copilot, which implies that it’s going to refuse to reply dicier questions, even when the use is official.
Microsoft Copilot is great. And it needs to be, proper? It is powered by GPT-4 and GPT-4 Turbo, and has entry to Bing’s search knowledge to assist bolster its generative capabilities. Getting access to GPT-4 tech with ChatGPT requires a $20 month-to-month subscription. My advice: Do not pay $20 per thirty days when Microsoft is gifting away OpenAI’s tech totally free.
Editor’s word: CNET is utilizing an AI engine to assist create a handful of tales. Opinions of AI merchandise like this, similar to CNET’s different hands-on evaluations, are written by our human crew of in-house specialists. For extra, see CNET’s AI coverage and the way we check AI.