are we ready for the age of copilots?

I had the opportunity to attend two of Microsoft’s major conferences this year in person: in May, Microsoft Build; and in November, Microsoft Ignite. While primarily I was there to run my company’s sponsored booth, I did attend the keynotes and deeply reviewed the Book of News with all of the announcements.

People were joking that the conferences should be renamed “Microsoft Copilot Conference”. Both times. Heads were almost spinning with all the new Copilot announcements, and immediately the efforts started to make sense of what this means for developers, and what this means for businesses.

Of course, selfishly, I was also thinking about what this means for me? Certainly, as a Solution Architect specializing in Power Platform, I needed to get up to speed on Power Apps Copilot for Makers, and the new generative plugins for Power Virtual Agents– sorry, Microsoft Copilot Studio. But within the whirlwind of announcements about M365 Copilot, the blurring lines between low code in Power Platform and pro code in Azure, and the unbelievable speed at which these changes are hurtling in our direction, it is obvious that my experience as an employee and as a consumer will be changing fast, too. And it got me wondering, am I ready? Are we ready?

And so, trying to embrace a growth mindset, I have been spending time over the past few months to test out and play with generative AI capabilities where I have access. My results have been interesting, and somewhat mixed. So let’s dive in and see if we can sort out some reality from all the hype, and really gauge what it’s going to take to be ready for the age of copilots.

test case #1: researching with Microsoft Copilot

Microsoft Copilot versus ChatGPT

Microsoft Copilot (the new name for Bing Chat/Bing Chat Enterprise) uses OpenAI’s GPT-4 model, which gets much of its training data on the open Internet to inform the Large Language Model. As a consumer, I prefer Microsoft Copilot chat to OpenAI’s ChatGPT for a few reasons. For one, Microsoft Copilot uses GPT-4 and the free version of ChatGPT uses GPT-3.5. GPT-4 outperforms 3.5 across every measure, and is rumored to have more than 1 trillion parameters versus 3.5’s 175 billion. Microsoft has extensively trained and fine-tuned their version of the GPT model as well.

Additionally, Microsoft Copilot is coupled with the Bing API to source answers through the search engine. Importantly, this results in responses that cite their sources, making it possible for you to verify the information that was summarized by the LLM. With ChatGPT alone, it’s hard to know if you’re getting information from sources that are accurate, up-to-date, trustworthy and not made up by the model.

As a result, Microsoft has combined together the powerful capabilities of an LLM to parse the prompt, determine intent, and form a response in natural language with the capabilities of a search engine to retrieve and rank relevant resources and meaningful information. The three modes also create grounding prompts to encourage the response to be more “Creative”, “Precise” or “Balanced”, depending on whether you’re trying to get real answers or generate new content.

time to research

Typically, when I need help with something, my approach is to cast a wide net with a search engine across documentation, blogs and forum posts to get smarter about the problem. I feel confident about my boolean/keyword search abilities, and I find the process helps me educate myself more deeply about the area and also get inspired with new ideas to try.

Schitts Creek Comedy GIF by CBC - Find & Share on GIPHY

The other week, I got an opportunity to test out Copilot instead of my normal process. In my case, I was deploying a Power BI report to a UAT workspace which had an embedded canvas app. This was a new scenario for me – I’m lucky I can even spell Power BI – and unfortunately, the team who actually built the report and canvas app had long since rolled off the project. My objective was to ensure the report was pointed to the UAT app and published to the UAT workspace, and of course, without breaking anything!

I opened up Microsoft Copilot in my Edge browser and started prompting about how to manage Power BI reports that have embedded apps. Microsoft Copilot is multi-turn, meaning that each prompt I write should retain the context of all the other prompts until I hit “New Topic” or run out of turns (a maximum of 30).

how to update an embedded canvas app on a power bi report
how do you manage ALM when deploying the report to a new environment?
how do you manage ALM when a power app is embedded on a Power BI report?

The results were disappointing. Because it’s possible to embed these Power Platform components both ways, the search results and resulting response continually confused my situation with embedding a Power BI report on a canvas app, or even on a model-driven app.

Okay. A simple search term probably isn’t enough for my specific scenario. Let’s try again.

For the next prompt, I attempted to ground the LLM with details:

You are deploying a Power BI report to a new UAT workspace. The data source is Dataverse, and there are both Development and UAT environments for Dataverse already configured. The report also has an embedded canvas app, which needs to point at the UAT version of the app when it's deployed from Development to UAT. The report needs to follow ALM best practices. What steps should be taken to update the Power BI report with the UAT Power App and deploy to the UAT workspace?

The response was closer, but the guidance was too high-level and did not include specific steps (just “Update the Power BI report to point to the UAT version of the Power App”). However, the resource it linked was to a community post asking my exact question– a heartening improvement, but sadly no one had satisfactorily answered their question, either.

I asked a follow-up question: “Where do you change the app ID in the Power Apps visual in step 2?”

Here’s where the response started to lose fidelity. The first linked resource was related to embedded canvas apps on Power BI report, but otherwise was not related to my question. Copilot then formed a response with specific, step-by-step instructions that guided me to change the “App ID” in the Power Apps visual in Power BI. One problem, however: that parameter doesn’t exist on the Power Apps visual pane. The steps were hallucinated.

The response went on to contradict itself, thanks to the other linked resources. The blog post in the 3rd link described two different ways to replace the Power Apps visual with the UAT version, which solved my issue. (Shoutout to Ankit at Perficient for the assistance! 🙂)

conclusions

As with any generative AI model, the results are not guaranteed. Nor are they likely to be consistent; on a different day, I could ask the same questions and get completely different responses. (Who knows, if enough of my readers click through to Ankit’s blog, we could change the SEO rankings and bump it to the top of the results for anyone looking for the same information. 😄) The responses are only as good as the resources found with the Bing API, which in turn is only as good as the search term that Copilot picks out of my prompt, which in turn is at the mercy of how well I phrased my prompt. So, the “human in the loop” is a critical element to any generative AI interaction, both as prompt engineer and as savvy information consumer.

In judging the efficacy of doing research with Microsoft Copilot, then, I’ll focus on two areas:

Did I save any time?
How did the interaction make me feel?

#1 is important. Ultimately, I spent just as much time trying to get accurate detail through chatting with Copilot as I would have doing my own Bing searches and scanning through the results myself. “Change Power Apps visual in Power BI” as a search term surfaces that same blog post as the 3rd result. I certainly wouldn’t have picked up the irrelevant forum post, nor would I have lost time trying to find the non-existent “App ID” parameter in Power BI Desktop.

#2 is interesting, because the experience degraded my trust in Microsoft Copilot as a whole. I felt frustrated that it took multiple turns to get the appropriate level of detail, and that I ended up with hallucinated steps that led me in the wrong direction. (I also felt frustrated that there’s seemingly not a good way to swap out Power Apps visuals during the Power BI ALM process, but that’s a topic for a different day! 🙃)

I’m not sure if I would try Copilot again for tech research or troubleshooting, which leaves me wondering what I would use it for. Is Microsoft Copilot intended to be an LLM-infused search engine? Or a search-engine-infused LLM chatbot? I will probably still play around with it (while I wait on my org to roll out M365 Copilot), but the jury is still out on how useful it will be. Perhaps I need to explore the other 5 types of AI conversations (The 6 Types of Conversations with Generative AI) to find opportunities to leverage Copilot in my day-to-day.

test case #2: Image generation with DALL•E 3

image generation proliferation

There are a surprising number of places where Microsoft has opened up image generation capabilities, so there are a range of options if you want to try creating something yourself.

The first is within Microsoft Copilot (Bing Chat). Alongside the GPT-4 chatbot and Bing API search, they’ve also wired in DALL•E 3 to generate 4 different images based on your prompt. Because Copilot is multi-turn, you can refine the results with follow-ups like “Can you make it simpler?” or “Can you change it into a 3D render?” You can also leverage search results to ground the image you want to create.

Here’s some fun examples I created:

illustration of a lightbulb, with light teal details, surrounded by mid-century style sparkles in a watercolor style

illustration of a lightbulb, with light teal details, surrounded by mid-century style light purple-red sparkles in a watercolor style

Prompt	Image
The Joy of Power Apps logo is an illustration of a lightbulb, with light teal details, surrounded by mid-century style sparkles. Recreate this image in a watercolor style
Can you make the sparkles a light purple-red?
Can you make it a 3D render instead?

I had some mixed results while testing this out. Sometimes the chat kept insisting that it had generated a new image, but the image generation widget never loaded. I had to refresh and start over, which means I lost the context of my conversation. That could be frustrating if you were 15 turns into refining your image! Neither conversation context, nor the images created, are persisted outside of the session, so there’s no way to pick up where you left off.

Later, I ran into a situation where Copilot became testy with me for asking for changes.

*hold on, this whole operation was your idea!*

In another example, I tried to rely on search results to ground the image (such as “Can you make a watercolor illustration from the Joy of Power Apps logo?”). I must not be popular enough, because it hilariously sourced Matt Devaney’s post on 2,000 Free Power Apps Icons and generated some truly incredible and oddly horse-forward pieces.

*I thought Matt’s whole thing was cats, not horses*

Copilot manages the multi-turn image generation by updating the prompt that it sends to Image Creator. You don’t have as much control over the prompt, and it’s more challenging to see what changes were made to it. So, if the conversational interface is getting a bit tedious, there are a couple of other options to get direct access to an image generator.

You can access “Image Creator from Designer” through Bing. There are a few limitations to be aware of: first, there’s a 480 character limit to the prompt in this interface. Second, this user interface has “Boosts”, which limit how many images you can create. If you run out of boosts, you won’t be able to use Image Creator until your boosts refresh (or you buy more). [In the long-run, I’m guessing this will be the consumer-facing entry point to using Image Creator.]

The benefit is that you can tweak your prompt directly to make changes and generate 4 new images with each single-turn request. Plus, Copilot doesn’t sass you for asking for new options or prompt changes!

making a cartoon selfie shouldn’t be this hard

There’s a different interface for Image Creator which can be accessed directly from Designer. It’s worth noting that these interfaces are all still in Preview, so it’s not clear exactly what direction each access point to this image generation service will take as the product moves into its GA lifecycle. [I predict that the direct Designer interface will be the enterprise version, and is likely to have license requirements to use.]

For now, the direct Image Creator access doesn’t have the same character count or boost limitations as the Bing.com interface, so you can continually tweak your prompt until you’re happy with the image. This was perfect for taking up the “cartoon selfie” AI challenge that I’ve seen making the rounds of LinkedIn.

When you’re making an image of just anything, a few rounds of prompts will likely do it. But when you want to make a reasonable facsimile of yourself, the level of fidelity is naturally going to increase. Even as cartoon versions, we all want to look somewhat recognizably as ourselves. Which means the prompt you write has to include a lot of detail about you, down to the camera pose and the clothes your toon will wear.

I decided to recreate my iconic smirk that I use as my profile picture throughout my digital life. Through much trial-and-error, I refined my prompt to describe myself as smirking, got detailed about my haircut, and tried to get my head position dialed in.

The main problem with the results I was getting is that I am a person in a big body. Images representing me as thin don’t resonate with me, and I wanted my cartoon to reflect that. However, this ended up being a huge challenge with the image generator: about half of the time, prompts which included descriptors like “overweight”, “chubby”, “fat”, or “big” were caught in the responsible AI filter.

*it’s probably not my “clever eyes” that the filter has an issue with* 😤

It’s unsurprising, but hurtful all the same. Model training includes categorizing negative, toxic, dangerous and offensive topics, and somewhere along the line, that may have included descriptions and depictions of fatness. Artificial intelligence can only be as good as the data we raise it on and the biases of the humans who train it, and sadly, we still have a long way to go.

In the end, I lost track of how many different cartoon selfie versions I generated, but it was in the hundreds. I saved 4-5 of them, and here are my top 3. You can tell me how well you think I did.

conclusions

We are undeniably in the Age of Copilots now. But as we rocket forth, it’s important to remember that we are still the Pilots. The analogy is very apt: pilots still need to be well-trained, alert, and in control of the plane.

I also got to attend the Power Platform conference in October this year. At our booth, I had a lively conversation with a fellow Power Platform enthusiast who asked me why I don’t use ChatGPT to write my blog? The question shocked me a little bit, because it honestly would not occur to me to have an LLM do my writing for me. I used GPT-4 extensively to research this post, and create example content and take screenshots. But every single word was written by me, for the same reason that anyone does the hard work of getting their pilot’s license: I really enjoy it. We’re about to see generative AI crawl into every gap of our online lives, both at work and at home. But that doesn’t mean that Copilots are taking over the plane! We just need to define the right jobs that Copilot can do for us, and focus on the things we’re best at and the things we love doing.

Copilot isn’t the same as one of our human coworkers, so not only do we need to be smart about the prompts we write, we need to be even smarter about critically reviewing the content we get back to make sure we end up at the right destination. This goes beyond just checking that the information source isn’t out-of-date or making sure the model did not hallucinate. We need to carefully inspect our inputs and AI outputs for bias, especially when the outcomes could cause harm. This highlights the critical needs for diversity & inclusion even at the base level of model-training, and ethics standards for AI. This is an area I’ll be paying closer attention to as Microsoft, OpenAI and other major generative AI players continue to innovate.

As generative AI continues to expand into the different services in the Microsoft stack and beyond, you’ll have to experiment for yourself to see where it can best augment your existing habits. As Copilots mature, interfaces unify and products consolidate, we’ll hopefully see some improvements to the rough edges and commitment to responsible AI that doesn’t accidentally filter out aspects of people’s lived experiences.

I just hope that we’re ready.

Microsoft Designer was used to generate the featured image. The prompt used was:

watercolor illustration of a chubby female pilot flying a plane. She has a robot copilot who is being helpful. The pilot is pointing out the direction they are flying. the plane is teal and light purple-red colored and has a lightbulb painted on the tail. the plane is flying through space surrounded by mid-century golden sparkles and otherwise has a black background. the entire plane is in frame.

One thought on “are we ready for the age of copilots?”

Add yours

Bülent says:

December 7, 2023 at 4:53 am

Great article! Today I’m getting better results for my researches with ChatGPT as with Copilot Chat / Bing Chat Enterprise, but it really depends on what you type in and if you can trust the ChatGPT results. We will see better improvements in the next year and I’m looking for it!

LikeLiked by 1 person

are we ready for the age of copilots?

test case #1: researching with Microsoft Copilot

Microsoft Copilot versus ChatGPT

time to research

conclusions

test case #2: Image generation with DALL•E 3

image generation proliferation

making a cartoon selfie shouldn’t be this hard

conclusions

One thought on “are we ready for the age of copilots?”

Add yours

Leave a reply to Bülent Cancel reply

attributions

Blog Stats

Subscribe to RSS Feed

Follow Blog

Search for posts

Tags

test case #1: researching with Microsoft Copilot

Microsoft Copilot versus ChatGPT

time to research

conclusions

test case #2: Image generation with DALL•E 3

image generation proliferation

making a cartoon selfie shouldn’t be this hard

conclusions

Share this:

Related

One thought on “are we ready for the age of copilots?”

Add yours

Leave a reply to Bülent Cancel reply

attributions

Blog Stats

Subscribe to RSS Feed

Follow Blog

Search for posts

Tags