Team Malware Hunters

Team Members:  Jeremy Graves, Justin Johnson, and Quintin Donnelly

PROBLEM TITLE:  Associating IP Address with Malicious Activity

ORGANIZATION:  NSA-2

BACKGROUND

A critical step for any analysis of malicious network activity is in the source and destination addresses. The most basic information like WHOIS and registry information are easy to programmatically retrieve, but large quantities of rich data exist in less formatted sources like forums, reports, and published black and white lists. The ability to rapidly gather this information and present it in a single report would effectively remove a lengthy step from analysts’ process.

CHALLENGE

Analysts cannot synthesize and report IP address or domain and associate with malicious activity in a timely manner.

BOUNDARIES

Must be able to function through a proxy or run as a web service

Develop a programmatic technique to research an IP address or domain for association with malicious activity, and present a summarized report. The type of information requested ranges from attributes like registry details to mentions of the IP or domain in security forums, incident reports, and black lists.

PROBLEM SPONSOR

National Security Agency (NSA), Jill (jill4defense@gmail.com)

Hypothesis

Given some data (e.g., IP address or URL) we can scan selected public resources for a correlation to malicious activity or misuse and provide a summarized report of what is found.

Where we are?

Making contacts and lining up interviews while researching technologies and sites that would be useful in addressing this problem.  Found a security related open source project that might have some usefulness.  OWASP has chapters and members in multiple states throughout the U.S.  Sent emails off to C-Level contacts in multiple locations.  Have received contact and support from problem sponsor who is assisting in lining up candidates for interview.

Interviews

       June 12, 2017 – Bob Hopkins, Chief of Police at The University of Southern Mississippi.

Gave a brief overview of problem and discussed what we might be providing as a solution.  Asked Mr. Hopkins, “If what we will be developing would be useful to his office or tasks.”  The response we received was that a majority of what they deal with or in is criminal related.  IP and URL data and correlations wouldn’t be that useful to their office.

Now if the same application could search other sites or similar sites for details in regards to an individual name, phone number, address, or some other related detail and quickly provide summarized discovery.  This information might speed up their investigations into a crime.

In the interview we also discovered that some thought might need to be taken into from where and how data is acquired.  Acquiring from the wrong location in their case might cause a criminal to get out of jail free so to speak.

 

June 13, 2017 – Bob Wilson, Technology Security Officer at The University of Southern Mississippi.

Discussed threat intelligence and the problem given to our H4D group.  Bob stated, “That researching IP addresses and URLs wasn’t a high priority for him, and identifying maliciousness of addresses and URLs were tough.”

For him having an application or device that monitored the network realtime for threats and that provided threat intelligence was more important.  Also having the ability to inventory the nodes on the campus network would be a useful tool.  And a big concern was security awareness and how to address with so many users.

June 14, 2017 – # 2, Researcher within DoD, performs passive data collection and research on hacking tools.

Need is for tool that given IP Address/Subnet/URL perform research and return a report.  The research should look for whether address is client, server, or both; it’s web presence or history; SSL certificate validity and information; who owns it.   This interviewee said a score on maliciousness of address or URL would be nice if the formula used to get the score was provided.  The pain they are suffering is that their, ” Drowning in Data ” and the tool has potential to alleviate some of that pain.

June 14, 2017 – # 4, Analyst or Data Scientist, primarily looks for suspicious traffic.

Identified that our application needed to use popular security forums in its research.  Wouldn’t use active scanning in application so as not to alert the potential offender(s).  Felt that if application was done right and received well that it could save up to a few hours per IP address or URL.  Removing some of the manual research required in open source or public domain.

June 15, 2017 13:00 – # 6, Data Scientist / Crypto

Interested in internal and external inbound and outbound traffic to IP Address(s).  Interested in Public registration information on IP address(s).  The type of data traffic coming from or going to a specific IP Address is of interest.  Is it possible to look for patterns with this app in 6 weeks of development?

June 15, 2017 13:30 # 7, Researcher – Developer / Infrastructure

Recommended pre-fetching in smart way.  Application should be concerned with specificity of IP address(s).  Is the address static or dynamic in time?  What is the time frame that the IP address(s) we are researching will still be there in one, two, four, forty-eight hours or longer?

June 15, 2017 16:00 # 8 – Problem Designer

Suggested looking into common crawl within AWS.  Another useful input would be a text string (verses IP/URL).  Text String might be packet data/windows registry/ or other tidbits that could possibly help with malicious research.  Need to show where things are found, when they were posted, how they were found.  Review cookie licking.

6/16/2017 9:00 # 9, Strategy Engagement and Outreach

Interested in the communications inbound on networks.  Would like app to look at cross site scripting/phishing.  Provided team lots of useful insight into the workings of things without giving too much detail.  Environments are growing increasingly complex making it harder to monitor and find security holes.  Need way to increase efficiency, id and respond to problems or threats quicker.  What is the potential bad?

6/16/2017 10:00 # 10, Lab Science/ Lead Researcher

Grab domain registration/whois/reverse lookup from multiple locations around the globe.  Look for places where data is different.  Application needs to provide all data as well as summarized data where differences are identified.

Some newsgroups and sites to review and interact with:

 Stackexchange, Centralops.net,  Cert,  DHS,  YARA signatures,  Virustotal

Output formats: STIX/CSV/JSON

6/21/2017 14:00 # 21, Analyst

Intel analysts and now researcher and visually represents the data.  Finds it attractive if he can use the time to dive deeper into the situation in order.  Classifications and getting best practices that are used.  Get recommendations to what queries are used to gather information.  To visualize data format the data sunburst diagram.

6/22/2017 12:00 # 22, Developer  – C++/Ruby

JSON input/output, Google Protocol buffers.  Want some customization by Analyst or Researcher.

Save Query and Output formats for future.

JSON Rest API

6/22/2017 13:00 # 23:   – Data Scientist /  # 23 – Research Engineer

We’re trying to leverage open source data and academic environments to assist in solving problems.

Query interface into data  / JSON API

Administrator Forums for virus, Microsoft, Cisco (forums to research and search with application

Security things to search Defcon and Blackhat, ISC squared / isc.org.

Contact folks with IEEE, IETF, OWASP.

6/22/2017 14:00 # 24, Analyst Lead

JSON Rest API, MVP

 

Things we’ve learned so far.

  • Analysts are using manual means to research suspicious traffic and identify anomalies or maliciousness.
  • 5-10% of their time throughout the year could be re-purposed by automating the research process.
  • Mean Analyst pay is approximately $44/hour and their are 82,000 analysts per the department of labor.
  • There is static in time big data available in repositories such as Amazon Web Services that might be usable in speeding up the automated research process.
  • We’re drowning in data and need a way to minimize the time to process and provide results.