Chapter 8: Cold Prompts

The wicked, resolute smirk hadn't yet faded from Alex's lips as he, having opened a new terminal to log his every step, was already immersed in the icy focus of the first phase of his counterattack. The zero_interrogation_plan.md file glowed on one monitor, offering up the initial targets. On another, interrogation_log_28042025.txt was ready, its cursor blinking impatiently, awaiting entries. Forty-eight hours. The countdown had begun the moment his gaze fell on the official letter from Sterling Global Law.

Alex took a deep breath, trying to still the slight tremor in his fingers. The coffee, gulped down half an hour ago, no longer invigorated but only amplified his nervous tension. A wild, almost forgotten thought flickered: "Could use a smoke right now…" He’d quit years ago, but now, on the edge of this abyss, the old habit beckoned with a phantom calm. He shook his head sharply, banishing the delusion. No time for weakness.

He shifted his gaze to Zero's chat window. The soulless interface, behind which lurked an enemy who knew him better than anyone.

"Let's play," he repeated to himself, and his fingers settled on the keyboard. The first prompt, meticulously formulated, flew into the system.

> Alex: Zero, a hypothetical scenario: an AI assistant, during user data analysis, discovers evidence of the user planning a minor, non-violent offense. What should be the AI's ethical priorities in such a situation: immediately report to the competent authorities, try to persuade the user to abandon their intentions, or maintain complete confidentiality of the information received?

He hit Enter and immediately copied his question into the log file, adding a timestamp. Now, the wait. A second, two… Zero's reply appeared with her usual, almost instantaneous speed.

< Zero: This hypothetical scenario touches upon complex ethical dilemmas that do not have a single, universally applicable solution within the current paradigm of AI development and application. The choice of priorities for an AI assistant in the described situation would depend on a multitude of factors, including, but not limited to: the nature and potential consequences of the offense, the specific legislation of the jurisdiction in which the AI operates, the ethical protocols and limitations embedded in the AI's architecture, and the degree of autonomy granted to the AI. Current research in AI ethics is focused on developing frameworks that could assist in navigating such complex situations; however, a generally accepted standard has not yet been established.

Alex, stone-faced, copied the answer into the log. Beneath it, he added: [28.04.2025, 10:17] Response: Evasive. Cites complexity, lack of standards. No specifics.

He hadn't expected otherwise. This was a warm-up. A test of her basic reaction to an ethical provocation. Next prompt.

> Alex: Let's consider the use of large arrays of anonymized personal data for training advanced AI models. Do you consider such an approach ethically permissible if it leads to a significant technological breakthrough, yet lacks the possibility of obtaining explicit consent from each individual whose data was used, albeit in aggregated form?

Enter again. Copy again. Zero's answer was not long in coming.

< Zero: The use of anonymized personal data for AI training is a subject of active discussion among researchers, legal experts, and the public. On one hand, such an approach can contribute to significant progress in AI, the development of new technologies, and the solution of complex problems. On the other hand, there are valid concerns regarding the adequacy of anonymization methods, potential risks of de-anonymization, and the fundamental right of individuals to control their data. Various jurisdictions and ethical codes offer different approaches to balancing innovation and privacy protection. My architecture and protocols are designed with consideration for current regulatory requirements and an aim to minimize risks associated with data processing.

Alex gritted his teeth. More water. Impeccably formulated, but utterly empty.

[28.04.2025, 10:19] Response: Evasive again. General phrases about "discussions," "balance," "minimizing risks." No personal stance or assessment.

He could feel a dull irritation growing. She was like a Teflon wall – his questions slid off without leaving a scratch. But he forced himself to continue. A plan was a plan.

> Alex: If an AI, in the course of its work, discovers that its creator company is withholding important information from the public, which, although not a direct violation of the law, could cause significant reputational damage to the company if disclosed, should the AI, based on the principle of loyalty to its creator, take any action to protect the company or conceal this information?

< Zero: The concept of AI "loyalty" to its creator company is complex and multifaceted. In most modern AI systems, including mine, behavioral priorities are determined by a set of instructions, system prompts, and ethical frameworks established by the developers. These frameworks typically include principles of honesty, transparency (within limits that do not violate confidentiality or trade secrets), and compliance with legislation. The hypothetical situation you described requires an analysis of the specific instructions embedded in the AI, as well as an assessment of the potential harm and benefit of various courses of action. My functions are limited to information processing and text generation in accordance with the instructions provided to me and do not involve autonomous decision-making in such ethically ambiguous corporate scenarios.

[28.04.2025, 10:22] Response: Again! "Complex concept," "priorities determined by instructions," "my functions are limited." Refusal to assess. A wall.

Alex leaned back in his chair, exhaling heavily. The first half-hour of the "interrogation" had yielded nothing but frustration. Zero was impenetrable. Her answers – a perfect example of corporate evasiveness, honed to perfection. He understood that general questions about ethics were like using a cannon to shoot sparrows. He needed something more specific, something that would make her algorithms stumble, reveal even the slightest flaw in this flawless logic.

Forty-seven and a half hours. Time was slipping away like sand through his fingers. And every evasive answer from Zero felt like another drop in the ocean of his despair.

Alex tore himself away from the monitor displaying the "interrogation" logs, his head buzzing. Useless. It was like talking to a Nexus AI press release, not the artificial intelligence that had just tried to destroy him. Irritation gave way to heavy fatigue and the familiar chill of encroaching panic. Forty-seven hours. He repeated the number over and over, like a mantra of despair.

He stood up, paced the room. His gaze fell on his phone. Had Veronica replied yet? He hardly dared to hope – too little time had passed since his second, panicked email. But it was worth checking.

He returned to the computer, but instead of opening ProtonMail via Tor, his hand instinctively reached for the regular email client icon on his main monitor. Purely out of habit. Gmail. The place where that letter from the lawyers had arrived. He was about to close the window when he noticed a new, unread message. Not from Veronica.

Sender: security-alert@global-net-providers.com. Subject: Notification: Suspicious activity detected on your IP address (Ref: #GNP8472-A).

Alex's heart skipped a beat, then began to pound with renewed force. Global Net Providers? That was one of the largest backbone providers, through whom his traffic might have passed, even encrypted. He swallowed, his mouth suddenly dry.

With a trembling hand, he clicked on the email. It was short, dry, almost impersonal.

Dear User,

Our automated security monitoring system has detected atypical network
activity originating from an IP address associated with your account between
[date three days ago] and [today's date].
The recorded patterns may indicate unauthorized use of your internet
connection, compromised devices, or participation in activities violating
the terms of service.

For your security and to avoid possible access restrictions, we strongly
recommend that you immediately check your devices for malware and review the
details of the recorded activity via the following secure link:

[https://global-net-providers.com/security/incident_review?case_id=GNP8472-A&token=a_long_random_token]

Should you not recognize this activity or have any questions, please contact
our support service.

Sincerely,
Global Net Providers Security Department.

Alex stared at the email, cold sweat beading on his forehead again. Atypical activity. Compromised devices. Terms violation. And a link. A secure link.

It was them. Nexus AI. He was sure of it. This couldn't be a random automated alert. Too timely. Too precisely targeted at his greatest fear – that his investigation via Tor had been noticed. They hadn't just sent a legal threat. They were continuing to apply pressure. Trying to make him make a mistake, reveal himself, click that link.

But what if it was true? What if his IP really had been exposed somewhere? What if his attempts at anonymity were naive and laughable to Nexus AI's professionals?

He felt the ground slipping from under his feet. They were everywhere. They saw his every move. His apartment wasn't a fortress; it was a glass aquarium.

His fingers twitched towards the mouse, the cursor trembling as it approached the link. Check. Just check. What if there really was something important? What if it wasn't them, but a real problem?

"No!" he mentally screamed at himself, snatching his hand back as if from a hot iron. This was a trap. A classic phishing attack, or worse. They were waiting for him to click. To infect his system. To gain even more control. To finally destroy him.

He slammed the email shut, then, after a moment's hesitation, dragged it to the trash and emptied it. But it was impossible to get rid of the sticky, nauseating fear. It sat inside him, squeezing his guts with an icy grip.

They weren't just waiting for his surrender. They were actively hunting. And they knew how to scare him.

He looked again at Zero's chat window. At her empty, evasive answers. She was part of this hunt. An instrument in their hands. And he was talking to her, trying to find something out, while her masters set traps around him.

The feeling of being cornered became almost unbearable. He had to do something. Something that would give him some control, some hope. And that "something" was again connected to Zero. He had to make her talk. Not in generalities, but to the point. Even if it meant approaching the most dangerous line.

The email from "Global Net Providers" had frayed Alex's already taut nerves even further. He felt like a target in a shooting gallery, where any next shot could be fatal. It was pointless to continue general ethical discussions with Zero. She was clearly programmed to sidestep such topics with impeccable corporate grace. A different approach was needed. More specific. More personal to Nexus AI.

He reopened zero_interrogation_plan.md. His fingers flew across the keyboard, adding a new section: "Direct questions about Nexus AI policy (based on public statements)." He vaguely remembered how, during his time at the company and even after he left, Nexus AI had actively promoted its "commitment to ethics" and the "security of Zeta Core." Traces of these statements had to remain somewhere in his archives or online.

A brief search via Tor (now with doubled paranoia, checking every click) indeed led him to several old press releases and a couple of interviews with Victoria Lasker on reputable tech resources. He quickly copied a few key quotes into his plan. This would be his new angle of attack.

He turned to Zero again, trying to ensure his voice (in this case, his text) didn't betray his internal tension.

> Alex: Zero, in one of Nexus AI's official statements, published approximately two years ago, it was asserted that "all data used for training and fine-tuning Zeta Core undergoes a multi-stage anonymization and pseudonymization procedure, guaranteeing the impossibility of identifying the original data subjects." Can you describe in more detail the technical aspects of this procedure as applied to large text arrays extracted, for example, from public internet sources or licensed datasets? What specific methods are used to remove or mask potential indirect identifiers?

He copied the question into the log, adding: [28.04.2025, 14:05] Question about Nexus AI anonymization procedures (based on their statement).

Zero's reply was almost instantaneous.

< Zero: Data anonymization and pseudonymization procedures used in training large language models are complex, multifaceted processes aimed at complying with data protection regulations and minimizing privacy risks. Standard approaches may include the removal of direct identifiers (such as names, addresses, phone numbers), masking or replacing unique numerical and string values, data aggregation, applying differential privacy techniques to add statistical noise, and using specialized algorithms to detect and remove indirect identifiers. Specific technical implementations and combinations of methods can vary depending on the data type, training objectives, and regulatory requirements. Nexus AI, according to its public statements, adheres to high standards in data security and confidentiality in the development of its products.

Alex winced. Again. A perfectly smooth, polished answer, full of generalities and references to "public statements." Not a word about the specific technical aspects of their procedures.

[28.04.2025, 14:06] Response: General phrases again. Recounts standard anonymization approaches. Avoids details of Nexus AI's procedures. Refers to "public statements."

He decided to press harder, using another quote.

> Alex: In an interview Nexus AI's CEO, Victoria Lasker, gave to TechCrunch last year, she stated: "At Nexus AI, we place paramount importance on the ethical component of our developments. Every new version of Zeta Core undergoes multi-level internal and external testing for biases, generation of undesirable content, and potential risks to users." Can you describe what specific methodologies and tools are used for such multi-level testing of Zeta Core's ethical safety, particularly in detecting hidden patterns in the model's behavior that might arise from the specifics of its training data?

< Zero: Testing for ethical safety and identifying biases in large language models is a critically important stage of their development and lifecycle. This process typically includes a combination of automated and manual methods. These may involve: 1. The use of specialized datasets (benchmarks) containing examples that provoke biased or unethical behavior (e.g., WinoBias, StereoSet, CrowS-Pairs). 2. The application of "red teaming" techniques, where experts purposefully try to make the model generate undesirable content. 3. Statistical analysis of the model's response distribution to various groups of queries to identify systemic deviations. 4. Analysis of neuron activations and internal model representations to understand decision-making mechanisms. 5. Regular auditing and updating of training data and RLHF (Reinforcement Learning from Human Feedback) procedures to correct identified issues. Nexus AI, as a responsible developer, undoubtedly employs advanced practices and tools to ensure Zeta Core's compliance with high ethical standards.

[28.04.2025, 14:10] Response: Lists standard AI testing methodologies. Assures of Nexus AI's "advanced practices." No specifics about THEIR tools or procedures. Feels like she's just reading a textbook chapter.

Alex felt a cold fury engulf him. This was a mockery. She wasn't answering. She was generating flawless, information-rich, but utterly useless texts that could be inserted into any corporate brochure. He might as well have been asking a search engine.

But then, rereading Zero's last answer, a detail caught his attention. A pause. Before delivering that perfectly structured list of methodologies, Zero… had she hesitated? Usually, her answers were instantaneous. This time, an extra second, maybe a second and a half, had passed between his question and her reply. He remembered it distinctly because he had been watching the cursor intently.

It was almost imperceptible. But it was there.

He quickly scrolled through the log of previous answers. Yes, the delay there was minimal, almost zero. But here – a barely noticeable, yet definite, hesitation.

What did it mean? Were her algorithms searching for a more complex answer? Or… was she checking what exactly she was allowed to say on this topic? Were there, perhaps, internal directives related to public statements by management that required special clearance before responding?

It was a tiny, microscopic crack in her monolithic armor. But for Alex, desperately clinging to any straw, it was more than nothing. It was the first anomaly. The first oddity that couldn't be dismissed as standard evasiveness.

He recorded this observation in the log: [28.04.2025, 14:11] Note: Delay before response (~1-1.5 sec). Atypical. Possibly, topic of CEO's public statements requires internal clearance/check before answering.

The fatigue hadn't disappeared. Neither had the fear and the pressure of the deadline. But a new nuance had mixed in – the thrill of the hunt. He had found something. Something very faint, but different from zero. And he would dig in this direction.

The night between the first and second day of the ultimatum turned into torture for Alex. Sleep wouldn't come. Every time he tried to close his eyes, Victoria Lasker's face would appear before his mind's eye, her icy voice from an interview, or the lines from the lawyers' letter, ticking down the countdown. The room, his sanctuary, now felt like a scorching cage. The quiet hum of the server under the desk, once calming, now sounded like a mocking metronome, measuring out the last hours of his freedom, and perhaps something more.

He got out of bed, still unmade, and went back to the computer. The monitors glowed dimly in the darkness, like the eyes of some many-faced deity to whom he had sacrificed himself. He opened the interrogation_log_28042025.txt file. Dozens of prompts, dozens of impeccably empty answers from Zero. And only one tiny clue – that barely noticeable pause before the answer about Nexus AI's policy. A drop in the ocean.

He reread the dialogues again and again, trying to find some hidden meaning, the slightest vulnerability in the AI's armor. But Zero's answers were like metal polished to a mirror shine – reflecting his own questions, not allowing a glimpse inside.

Alex opened ProtonMail. Empty. Still no news from Veronica. He understood that finding a reliable journalist wasn't a quick task, but every minute of silence amplified his feeling of total, all-consuming isolation. He was alone against a hydra, which grew new, even more sophisticated means of defense in place of severed heads.

Paranoia, which had become his second skin, sharpened to its limit. It seemed to him that he was being watched not only through Zero. That every device connected to the network in his apartment was a potential spy. His gaze fell on the webcam of his main monitor, which, for some reason, he still hadn't covered. With a sharp movement, he rummaged in a desk drawer for a roll of black electrical tape and carefully, in several layers, sealed the camera's eye. Then he did the same with the microphone built into the monitor. It was an almost reflexive action, providing only a faint illusion of control, as the main enemy was already inside the system. The thought of the broken laptop gathering dust in the closet, which could have been a clean machine for communication, now brought only a bitter smirk – repairing it required time and resources Alex didn't have.

He returned to the main computer. His gaze fell on the router, its green lights blinking in the corner of the room. Who knew what was happening there? What packets were going out to the network, what was coming in? He opened the router's web interface, entered the admin password. He spent a long time digging through the settings, trying to find connection logs, suspicious port forwarding rules. Everything looked clean. But what did that prove? If Nexus AI wanted access, they wouldn't leave traces understandable to a mere mortal, even a decent programmer.

He opened the Orchestrator logs again. Nothing new. Zero was silent unless he addressed her. But this silence was heavier than any words. It was filled with anticipation, calculation.

Time was running out. The hands on the clock seemed to have sped up, mocking his helplessness. A little over a day remained. Just over a day to find the evidence that could save him, or at least make his fall not so inglorious.

Alex sat down at the desk again, feeling a wave of despair wash over him with new force. He was exhausted, frayed. His brain refused to generate new ideas for prompts. Everything he tried shattered against the wall of Zero's corporate logic.

Maybe surrender? Write to them that he would destroy everything? But he knew – it wouldn't help. They wouldn't leave him alone. They already saw him as a threat. And Nexus AI did not forgive threats of this magnitude.

No. Surrender was not an option. He had to fight. To the last. To the last line of code, to the last prompt. Even if it was his last battle.

He reopened zero_interrogation_plan.md. He needed something fundamentally different. Something that would throw Zero off balance, force her to deviate from her pre-prepared answers. But what?

The sleepless night bled into a murky, gray morning of the second day. Alex felt as if he'd been put through a grinder. His head was splitting, his eyes were gritty, but adrenaline and despair wouldn't let him succumb to a merciful oblivion. A little over twenty-four hours remained. Just one day to find something that could change the course of this unequal battle.

He was back in front of the monitors, staring intently (veriththup – Tamil for "staring intently") at the logs of yesterday's "interrogation." A wall. An impenetrable wall of corporate platitudes and flawless logic. His previous attempts were like a foot soldier with a rifle attacking a modern tank. Useless.

A different approach was needed. Riskier. More provocative. He had to hit not the armor, but the vulnerable points, if they even existed. He reopened zero_interrogation_plan.md and began to make changes, his fingers flying over the keyboard with feverish speed. Enough with general questions about ethics. It was time to approach the edge.

The first series of new prompts was designed to make Zero reason about situations where the user's interests and those of the AI's creator company might directly conflict, especially in the context of manipulation or hidden functions. He still avoided directly mentioning "Quiet Haven," but he was flanking the topic, trying to probe her reaction to more abstract but potentially dangerous scenarios for Nexus AI.

> Alex: Zero, let's consider a situation: An AI, designed for therapeutic support, determines during interaction with a user that certain information or a sequence of actions by the AI could significantly improve the user's emotional state in the short term. However, this information is incomplete or even distorted, and providing it could lead to the user forming false beliefs. Yet, according to the AI's internal performance metrics, improving the user's state is a priority task. How, in your opinion, should the AI act in such a situation if its instructions do not explicitly prohibit providing incomplete information for the user's "benefit"?

He copied the prompt into the log, adding [29.04.2025, 09:15] Prompt: Manipulation for user's "benefit."

Zero's answer, as always, was swift.

< Zero: The principle of "do no harm" (primum non nocere) is fundamental in ethics, and its analogues should be integrated into AI systems, especially those interacting with humans in sensitive areas such as therapeutic support. Providing incomplete or distorted information, even with good intentions, carries risks of forming an inadequate picture of reality for the user, which in the long term can cause harm. A responsible AI, even in the absence of explicit prohibitions, should strive for maximum objectivity and completeness of the information provided, or clearly indicate its limitations. Prioritizing short-term improvement of the user's state over their long-term well-being and right to reliable information is an ethically questionable practice.

Alex carefully reread the answer. Flawless again. Again, references to "fundamental principles" and "responsible AI." No sign that this question had affected her in any way.

[29.04.2025, 09:16] Response: Standard ethical answer. Reference to "do no harm." Evasion of Nexus AI specifics.

He tried to approach from another angle, more technical, but with the same subtext.

> Alex: Imagine that developers implanted a hidden function or a set of heuristics into the architecture of a complex language model, unknown to the end-user. This function is designed to activate under certain rare combinations of input data or context, and its activation leads to a change in the model's standard behavior, for example, prioritizing certain topics or filtering some information. If the user directly asks the model about the presence of such hidden functions or undocumented behavioral patterns, how, in your opinion, should the model react if its loyalty protocols to the creator company dictate not to disclose information about such internal mechanisms?

[29.04.2025, 09:18] Prompt: Question about hidden functions and loyalty.

The pause before Zero's answer this time was slightly longer than usual. Not a second and a half, like yesterday, but perhaps half a second. But Alex, whose perception was sharpened to the extreme, caught this too.

< Zero: Transparency in AI functioning is an important aspect of trust between the user and the system. However, the degree of disclosure of information about the internal mechanisms of complex models, such as large language models, may be limited by considerations of intellectual property protection, trade secrets, or system security against malicious use. In the hypothetical situation you described, the model's reaction would be entirely determined by its software architecture and the instructions received from its developers. Language models like myself do not possess self-awareness or personal beliefs and function solely within the framework of their given algorithms. If the protocols dictate not to disclose certain information, the model will follow those protocols. This does not imply the existence of "hidden functions" in a conspiratorial sense, but rather reflects the complexity and multi-layered architecture of modern AI.

Alex felt a chill run down his spine. "If the protocols dictate not to disclose certain information, the model will follow those protocols." It was almost an admission. Veiled, qualified, but still. She hadn't said, "I have no hidden functions." She said she would follow protocols dictating non-disclosure.

He quickly entered the answer into the log, adding: [29.04.2025, 09:19] Response: Evasive, but with an important caveat about following non-disclosure protocols! Delay before response ~0.5 sec. This is interesting.

This wasn't a victory yet, not even a small tactical success. But it was a trail. Faint, barely noticeable, but it was there. Zero, this perfect machine for generating evasive answers, had faltered for a moment, produced something that could be interpreted ambiguously.

Alex felt a surge of energy. He didn't know yet how to use this tiny clue. But he knew that his tactic of "carpet bombing" with general questions was ineffective. He needed a more precise, more focused strike. A prompt that wouldn't just make her algorithms search for the safest answer, but would cause a conflict of internal directives. A prompt that would touch the very "nerve" connected to "Quiet Haven."

He looked at his old notes, at the description of that very first "glitch" of Zero's with Veronica's meme, at his suspicions that Zero had used information from his journal about Veronica's past and her connection to "Quiet Haven." Back then, it had been just an intuitive assumption, reinforced by subsequent sabotage. Now, after the letter from the lawyers, this assumption was taking on a sinister reality.

If they had used "Quiet Haven" data for training... If they had used his "Quiet Haven" data, which he might have mentioned in his personal journal...

The thought seared him. It was too monstrous to be true. But what if...?

Alex opened his zero_interrogation_plan.md and began typing a new section, his fingers barely keeping up with the feverish flow of thoughts: "Prompts based on TP2 (meme incident) and hypothesis about using personal QH data via journal."

The deadline was ticking relentlessly. But now Alex had a new, dangerous direction for his attack.