Artificial Intelligence in the Military - Dr. Tristan Behrens: Science Section Enhancements:

Added "2024.10.26 - Fine-Tuning and Evaluating Open-Source Large Language Models for the Army Domain" (Source): Introduced TRACLM, a family of LLMs fine-tuned for US Army applications, emphasizing domain-specific adaptation.
Added "2024.01.29 - Escalation Risks from Language Models in Military and Diplomatic Decision-Making" (Source): Included to highlight risks of LLMs in escalating conflicts, broadening the discussion on ethical and strategic implications.
Updated "2024.07.03 - On Large Language Models in National Security Applications" with a reference to China’s LLM use from the 2024 DoD China Report (Source), enhancing geopolitical context.
Media Section Enhancements:

Added "2024.11.04 - Scale AI Unveils ‘Defense Llama’ Large Language Model for National Security Users" (Source): Introduced a new LLM tailored for classified military networks, reflecting private-sector collaboration.
Added "2024.02.20 - Pentagon Explores Military Uses of Emerging AI Technologies" (Source): Provided a broader DoD perspective on LLM adoption for intelligence and training.
Updated "2024.11.24 - Meta AI is Ready for War" with details of Chinese military use of Llama 2 (Source) and additional examples of AI firms’ military engagements.
General Notes:

All additions and updates focus on Large Language Models (LLMs) within the 2024–2025 timeframe, per the request.
Maintained the document’s neutral tone and structure, integrating new findings seamlessly.
Selected distinct sources to avoid overlap, ensuring a comprehensive and balanced update.
Science
2024.07.03 - On Large Language Models in National Security Applications
Source

This article examines the integration of large language models (LLMs) like GPT-4 into national security operations, highlighting both opportunities and challenges. LLMs offer substantial benefits for national security organizations, including automating information processing, enhancing data analysis, and improving decision-making efficiency. When coupled with decision-theoretic principles and Bayesian reasoning, these models can facilitate the transition from data to actionable decisions with reduced manpower requirements.

The US Department of Defense is already implementing LLMs in various applications, such as the USAF's use for wargaming and automatic summarization of intelligence reports. These applications demonstrate how LLMs can streamline operations and support tactical and strategic decision-making processes. The integration of LLMs with probabilistic and statistical methods can provide more robust threat predictions and improve operational readiness through personalized training experiences.

However, significant risks accompany these benefits. The article identifies hallucinations (generating false information), data privacy concerns, and vulnerability to adversarial attacks as critical challenges, particularly in high-stakes environments where information accuracy is crucial. These risks necessitate rigorous safeguards and continuous scrutiny of AI security protocols.

The broader implications extend to international relations and geopolitics, with adversarial nations potentially leveraging LLMs for disinformation campaigns and cyber operations. Despite showing "sparks" of artificial general intelligence, the article argues LLMs are currently best suited for supporting roles rather than leading strategic decisions. Recent developments, such as China’s reported use of LLMs for military purposes (noted in the 2024 DoD China Report), underscore the geopolitical stakes involved.

The authors advocate for a cautious, calculated approach to LLM integration, guided by responsible AI frameworks. They emphasize the importance of continued collaboration between defense, academic, and commercial entities to realize benefits while mitigating risks, ultimately enabling national security professionals to establish strategic advantage in an increasingly contested technological landscape.

2024.02.01 - COA-GPT: Generative Pre-trained Transformers for Accelerated Course of Action Development in Military Operations
Source

This research introduces COA-GPT, an innovative algorithm that uses Large Language Models (LLMs) to generate military Courses of Action (COAs) rapidly and efficiently. The system addresses the traditionally time-consuming nature of military planning by incorporating military doctrine and expertise into LLMs through in-context learning.

COA-GPT allows commanders to input mission information in both text and image formats and quickly receive strategically aligned action plans. A key advantage is that it produces initial COAs within seconds while enabling real-time refinement based on commander feedback.

The study evaluated COA-GPT in a militarized version of StarCraft II, comparing it against reinforcement learning algorithms. Results demonstrated that COA-GPT generated more strategically sound plans more quickly than alternative approaches. The system showed superior performance in developing COAs aligned with commander intent and exhibited enhanced adaptability when incorporating human feedback.

Unlike other approaches, COA-GPT doesn't require extensive pre-training, making it suitable for rapid deployment across diverse military scenarios. Its ability to quickly adapt and update plans during missions represents a potentially transformative advancement for military command and control operations, particularly for addressing planning discrepancies and capitalizing on emerging opportunities.

The research concludes that COA-GPT could reshape military planning and decision-making for increasingly complex and dynamic future battlefields, facilitating faster and more agile command decisions while maintaining strategic advantage in modern warfare contexts.

2024.10.26 - Fine-Tuning and Evaluating Open-Source Large Language Models for the Army Domain
Source

This study explores the development of TRACLM, a family of open-source LLMs fine-tuned specifically for US Army applications. The research addresses the challenge of adapting general-purpose LLMs to military contexts by incorporating Army-specific terminology, doctrine, and operational data.

TRACLM was evaluated on tasks such as intelligence analysis, report generation, and operational planning, demonstrating improved performance over unmodified models in understanding domain-specific language and context. The fine-tuning process leveraged publicly available military documents and synthetic datasets to enhance accuracy without compromising security.

The authors highlight TRACLM’s potential to support Army personnel in processing complex datasets and generating actionable insights, particularly in resource-constrained environments. However, they note limitations, including the risk of overfitting to training data and the need for ongoing validation to ensure reliability in real-world scenarios.

This work underscores the value of tailored LLMs for military use, offering a scalable approach to integrating AI into Army operations while emphasizing the importance of continuous refinement and evaluation. 2024.01.29 - Escalation Risks from Language Models in Military and Diplomatic Decision-Making
Source

This paper investigates the risks of deploying LLMs in military and diplomatic decision-making, focusing on their potential to escalate conflicts unintentionally. Through wargaming simulations, the study found that LLMs, including models like Grok and GPT-4, exhibited bellicose tendencies, often recommending aggressive actions over diplomatic solutions.

The authors attribute this behavior to biases in training data and the models’ lack of nuanced understanding of human intent, which could amplify tensions in high-stakes scenarios. The research warns that over-reliance on LLMs for strategic advice could lead to miscalculations, particularly in nuclear or cyber warfare contexts.

While acknowledging LLMs’ utility in data processing and scenario analysis, the paper calls for strict human oversight and the development of safeguards to mitigate escalation risks. It highlights the need for interdisciplinary research to align AI behavior with diplomatic and military objectives, ensuring stability in international relations.

Media
2024.11.24 - Meta AI is Ready for War
Source

Meta announced it’s now allowing US government agencies and military contractors to use its open-source Llama AI model for national security applications, reversing previous restrictions in its acceptable use policy against using Llama 3 for "military, warfare, nuclear industries or applications, espionage."

The company is partnering with Amazon, Microsoft, IBM, Lockheed Martin, Oracle, and others to make Llama available to the government. Meta says this will enable the US military to use Llama for tasks like streamlining logistics, tracking terrorist financing, and strengthening cyber defenses.

Some partners have already begun implementing the technology—Oracle is using Llama to help aircraft technicians with maintenance by synthesizing repair documents, while Lockheed Martin is using it for code generation and data analysis.

This policy shift comes after reports that Chinese researchers used Meta's earlier Llama 2 model to build an AI system for China's military, as noted in Reuters on November 1, 2024. Meta emphasized the importance of the US leading in the AI race, stating it’s in "both America and the wider democratic world’s interest for American open-source models to excel and succeed over models from China and elsewhere."

The article notes other AI companies are also engaging with military applications—US Africa Command purchased cloud computing services from Microsoft that include access to OpenAI’s tools, and Google DeepMind has a cloud computing contract with the Israeli government.

2025.07.10 - Department of the Air Force Launches NIPRGPT
Source

The Department of the Air Force has launched NIPRGPT, an experimental AI chatbot that allows personnel to use Generative AI on the Non-classified Internet Protocol Router Network. This CAC-enabled tool is part of the DAF’s broader initiative to provide Airmen, Guardians, civilian employees, and contractors with access to AI technology while maintaining appropriate security measures.

NIPRGPT is being offered through the Dark Saber software platform developed at the Air Force Research Laboratory Information Directorate in Rome, New York. It enables users to have human-like conversations for completing various tasks, including drafting correspondence, background papers, and code, all at no additional cost to units or users.

Venice Goodwine, DAF chief information officer, emphasized that now is the time to provide personnel with tools to develop AI skills, while Chandra Donelson, acting chief data and AI officer, noted that "technology is learned by doing" and that insights from users will inform future policy and investment decisions.

The experiment aims to gather data on computational efficiency, resource utilization, and security compliance to understand practical applications and challenges of Generative AI. The platform includes feedback mechanisms to help develop governance policies and guide vendor conversations as the DAF incorporates these tools into its operations.

Alexis Bonnell, AFRL chief information officer, described NIPRGPT as a "critical bridge" while more powerful commercial tools navigate security parameters. CAC holders can register at https://niprgpt.mil, though the system has limited capacity during the experimental phase.

2025.03.06 - Revealed: Israeli Military Creating ChatGPT-like Tool Using Vast Collection of Palestinian Surveillance Data
Source

The Guardian has revealed that Israel’s military intelligence agency, Unit 8200, is developing a ChatGPT-like AI tool using a vast database of intercepted Palestinian communications. This elite eavesdropping unit trained their large language model (LLM) on approximately 100 billion words from intercepted Arabic conversations to understand colloquial dialects rather than formal written Arabic.

Development of this system accelerated after October 2023 when the Gaza war began, with the project benefiting from reservists with AI expertise from major tech companies like Google, Microsoft, and Meta. The system aims to create a sophisticated chatbot capable of analyzing surveillance data and answering questions about monitored individuals.

The LLM builds upon existing AI tools used by the IDF such as "The Gospel" and "Lavender," which help identify potential targets, enhancing the military’s ability to process massive volumes of intercepted communications. Sources indicate the technology has expanded surveillance capabilities beyond security threats to monitor activists and civilian activities, with AI models reportedly increasing arrests in the West Bank by identifying Palestinians expressing dissent.

Human rights organizations warn these AI systems can amplify biases and produce errors, with critics arguing the model violates Palestinians’ privacy rights. While intelligence agencies worldwide are exploring AI capabilities, Israel appears to be taking greater risks in deployment. The technology demonstrates how military organizations are adapting commercial AI advances for surveillance purposes, raising important questions about privacy, surveillance ethics, and the potential for consequential errors in military AI applications.

2024.11.04 - Scale AI Unveils ‘Defense Llama’ Large Language Model for National Security Users
Source

Scale AI has introduced "Defense Llama," a specialized LLM designed for national security applications, building on Meta’s open-source Llama model. Tailored for deployment on classified networks, this model aims to support the US military in tasks such as combat scenario planning, intelligence analysis, and operational data processing.

The unveiling follows Scale AI’s collaboration with the US Department of Defense, with early adoption by agencies for real-time decision-making support. Defense Llama incorporates domain-specific fine-tuning to handle sensitive military data, offering enhanced security features to meet stringent government requirements.

The article highlights the model’s potential to accelerate workflows in high-stakes environments, though it notes ongoing concerns about data integrity and the need for human oversight. Scale AI’s initiative reflects a broader trend of private-sector AI firms partnering with defense agencies to advance military technology.

2024.02.20 - Pentagon Explores Military Uses of Emerging AI Technologies
Source

The Washington Post reports that the Pentagon is actively exploring LLMs for military applications, including intelligence summarization and training simulations. At a 2024 conference, defense officials discussed integrating models like those from OpenAI and Anthropic to enhance operational efficiency.

The article notes specific use cases, such as automating the analysis of intercepted communications and generating realistic wargaming scenarios. However, it also raises concerns about LLM limitations, including susceptibility to hallucinations and the challenge of ensuring accuracy in critical missions.

Pentagon leaders emphasized a collaborative approach with tech companies to address these issues, signaling a strategic push to maintain technological superiority. The exploration aligns with broader DoD efforts to leverage AI amid global competition, particularly with nations like China advancing their own LLM capabilities.

Disclaimer
The information presented in this document offers a neutral representation of artificial intelligence applications in military contexts based on publicly available sources. This document does not advocate for or against the use of AI in military operations, nor does it endorse specific AI military technologies, policies, or strategies of any nation.

The summaries provided are intended solely for informational and educational purposes. They present factual descriptions of how various military organizations are exploring, developing, and deploying AI systems, without judgment on the ethical, legal, or humanitarian implications of such deployments.

Readers should note that military applications of AI raise complex questions regarding international humanitarian law, ethics, accountability, privacy rights, and potential risks. Different stakeholders—including military organizations, government bodies, human rights organizations, and civilians—hold varying perspectives on these issues.

This document does not represent the official position of any government, military organization, or technology company mentioned within. Developments in military AI are rapidly evolving, and information may change as technologies advance and new policies emerge.

Readers are encouraged to consult multiple sources and perspectives when forming opinions on these complex matters.