Friday, March 6, 2026

Make Gemini speak

For an index to all my stories click this text.

In the previous story I showed how to build a webpage on which you could type a question. The question was then send to a simple AI system and then spoken out.
You can read that story here: https://lucstechblog.blogspot.com/2026/02/2a-speaking-ai.html

It works but has some flaws. It uses a special service to avoid a CORS error. And that service sometimes is so busy, that you need to retry sending your question a few times. Which is annoying.

I had better luck with Google's Gemini AI.
I know that there are people out there that hate Google. But I actually like them. And their AI has a great free tier, and it works great.
So I am going to rebuild the webpage with Gemini.

To use the following program you will need to obtain an API key from VoiceRSS. Read this story that tells you how to get it: https://lucstechblog.blogspot.com/2026/02/text-to-speech-with-voicerss.html

And you'll need an API key for using Google's Gemini. Read this story that tells how to get it: https://lucstechblog.blogspot.com/2026/02/build-webpage-with-gemini-ai-as-your.html


Sidenote.

Javascript is a fun language which is not very difficult to learn. And you get instant results in your webbrowser. The language is useful for all kinds of projects, including IOT projects of which you find several on this weblog.
To make programming in Javascript easier I collected over 500 tips and tricks and bundled them in a book. The book is distributed world-wide by amazon.com


Click here to learn more about this book or to order it.

So what we are going to do is to build a webpage on which you can type a question. That question is send to Gemini and the answer is spoken out aloud. So turn up the volume of your computers speakers and let's go.

And here is the complete program.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Gemini Web Chat Demo</title>
  <style>
    body {
      font-family: system-ui, sans-serif;
      background: #f7f7f7;
      padding: 2rem;
    }
    #container {
      background: #fff;
      padding: 1.5rem;
      border-radius: 8px;
      box-shadow: 0 0 10px rgba(0,0,0,0.1);
      max-width: 600px;
      margin: auto;
    }
    textarea {
      width: 100%;
      font-family: inherit;
      font-size: 1rem;
      padding: 0.6rem;
      border: 1px solid #ccc;
      border-radius: 6px;
      resize: none;           /* user can’t drag resize */
      overflow: hidden;       /* hide scrollbar */
      min-height: 2.5rem;
    }
    button {
      margin-top: 0.5rem;
      padding: 0.4rem 1rem;
    }
    #response {
      margin-top: 1rem;
      white-space: pre-wrap;
      background: #f0f0f0;
      padding: 1rem;
      border-radius: 6px;
      min-height: 100px;
    }
  </style>
</head>
<body>
  <div id="container">
    <h2>Ask Gemini</h2>
    <textarea id="userInput" placeholder="Type your question..."></textarea>
    <br>
    <button id="sendBtn">Send</button>

    <h3>Response:</h3>
    <div id="response" contenteditable="true"></div>

    <div id="output"></div>
    <audio id="audioPlayer" controls></audio>

  </div>

  <script>
    const Google_API_KEY = "PUT-YOUR-GEMINI-API-KEY-HERE"; // Replace with your key
    const MODEL = "gemini-2.5-flash";//works
    const textarea = document.getElementById("userInput");
    const sendBtn = document.getElementById("sendBtn");

    // Auto-resize textarea
    textarea.addEventListener("input", () => {
      textarea.style.height = "auto";
      textarea.style.height = textarea.scrollHeight + "px";
    });

    sendBtn.addEventListener("click", async () => {
      const input = textarea.value.trim();
      if (!input) return alert("Please enter a question.");
      const responseDiv = document.getElementById("response");
      responseDiv.textContent = "Loading...";

      try {
        const res = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/${MODEL}:generateContent?key=${Google_API_KEY}`,
          {
            method: "POST",
            headers: { "Content-Type": "application/json" },
            body: JSON.stringify({
            contents: [{ parts: [{ text: input }] }] // ✅ send user input
            })
          }
    );


        const data = await res.json();
        const text = data?.candidates?.[0]?.content?.parts?.[0]?.text || "(No response)";
        responseDiv.textContent = text;


        let rec_ans = text || "";
        rec_ans = rec_ans.replace(/[^A-Za-z0-9 \n.,!?\\*+\-%@$&:<>()[\]{}"`]/g, "").trim();

        //document.getElementById("output").textContent = rec_ans;
        //console.log("Received Answer:", rec_ans);

        const VoiceRSS_API_KEY = 'PUT=VOICERSS-API-KEY-HERE';
        const url = 'https://api.voicerss.org/';

        const params = new URLSearchParams({
          key: VoiceRSS_API_KEY,
          hl: 'en-us',
          v: 'Amy',
          f: '8khz_16bit_mono',
          src: rec_ans
        });

        const ttsResponse = await fetch(url, {
          method: 'POST',
          headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
          body: params
        });

        if (!ttsResponse.ok) {
          console.error('❌ TTS request failed:', ttsResponse.status);
          return;
        }

        const audioBlob = await ttsResponse.blob();
        const audioUrl = URL.createObjectURL(audioBlob);

        const audioPlayer = document.getElementById("audioPlayer");
        audioPlayer.src = audioUrl;

        audioPlayer.play().catch(err => console.error("🎧 Playback error:", err));

        console.log("🎵 Playing audio...");


      // ✅ Automatically trigger audio file download (no visible button)
      const downloadLink = document.createElement('a');
      downloadLink.href = audioUrl;
      downloadLink.download = `response_${Date.now()}.mp3`;
      document.body.appendChild(downloadLink);
      downloadLink.click();
      downloadLink.remove(); // clean up the temporary link

      } // end of try part

      catch (err) {
        responseDiv.textContent = "Error: " + err.message;
      }
    });
  </script>
</body>
</html>

I do think there are no real pitfalls here and the code is derived from the code from the previous stories. So I will not go into details here. Please check those previous stories for clarification of the code.
You can also send me a message if you want an explanation about a certain part of the code.

How to use this.

Copy the complete code and paste it in your favorite editor. Then save it as gemini.html Change that name in anything you want as long as it ends on .html
Then open the directory where you saved the program and click on it's icon.
Your default web browser will open with the webpage.


You can now type your question. Then press the Send button
The answer will be written as text in the output field and also spoken.
The audio file is also saved to your download folder, so you can use it for different projects.
Pressing the play button at the bottom of the screen will replay that audio file. So if you misheard the answer you can play it again without sending the question anew to Gemini.

You can restrict the answer from Gemini by putting in your question: "answer in x lines without giving an explanation". Where x can be any number of your choice.

Have fun playing with this. I sure am !!!

Till next time

Luc Volders