Big data mining technique helps discover startups

In the high-tech sector, the hunt for startup companies to purchase is intense. The challenge is how to find these new businesses. To help make finding these startups easier and give analysts a competitive edge, a W. P. Carey professor developed a data-mining tool.

Zhan (Michael) Shi, assistant professor of information systems, was the lead author of research published in MIS Quarterly in December, “Toward a Better Measure of Business Proximity: Topic Modeling for Industry Intelligence.” Gene Moo Lee from The University of Texas at Arlington in the Department of Information Systems and Operations Management, College of Business; and Andrew B. Whinston at The University of Texas at Austin in the Department of Information, Risk, and Operations Management, McCombs School of Business are the co-authors.

The researchers discovered a need for a tool of this nature while looking at a dataset of high-tech, startup companies. “It has very rich information for startup companies,” Shi says. “Very quickly we realized it’s hard to navigate this dataset.”

They collected the data from CrunchBase, an open and comprehensive source for high-tech startup activity including companies, people, and investors. CrunchBase keeps track of the industry by automatically retrieving and extracting information from news articles on technology-focused websites. Shi says that compared with other high-tech data vendors, CrunchBase has complete coverage of early-stage startups, especially those not yet funded by venture capitalists.

The research started from an information management perspective, Shi says. “How can we organize this information in a better way so researchers or market analysts can better navigate this dataset to understand the startup world,” Shi asked at the beginning of their research.

From there, Shi and his fellow co-authors began to think of how the organized data could work as a tool. Shi used his previous research on social networks and people to come up with the idea of creating a network of companies. “We thought maybe we could apply the same approach to companies; we want to build a network of businesses using this dataset as a way for people to navigate it,” he explains.

Shi and his co-authors then developed a programming tool to extract data and use it to find companies with many different similarities. Comparable to how Facebook or Twitter analyzes a user’s activity and favorites to suggest other similar people to follow. For example, if a large company is looking for a small startup business, other similar startups are not easily found, Shi says. The data was not organized in a way to see similar companies, making it difficult for those large buyers, like Google or Microsoft, who spend billions of dollars every year to purchase small startups.

“We need a tool for people to identify the target better. Just like in the Twitter example, you want to find the right person to follow and, in this case, you want to find the right company to look at,” Shi says.

Professor Raghu Santanam, chair of the department of information systems, says Shi’s approach is novel and synergistic for the current and future digital business world. “Today’s businesses are leveraging a platform approach to expand into a diverse range of ventures and industries,” he says. “This phenomenon makes it hard to pin down organizations into static groups of industries and geographic locations.”

A new method utilizing and organizing big data was needed. “A data-oriented approach enables us to dynamically identify a firm’s positioning within a diverse range of business eco-systems,” Santanam explains.  “As such, acquisitions and merger decisions can significantly benefit from this dynamic approach.” 

In recent years, there has been an explosion of high-tech startups, Shi says, due to emerging mobile, cloud, and analytic technologies. In the academic world, people are trying to use different business similarity measures for research of industry organization, marketing, and strategy, he explains, and in the industry, market analysts are looking for tools to identify and target those available startups for acquisition.

“These are the places we think our tool can be useful,” Shi says.

The researchers state in the paper that a company’s relatedness is an important metric for analytics-minded managers to identify potential partners, competitors, and alliance or acquisition targets. The study shows that their proposed measure provides greater detail and has proved itself to be useful in high-tech merger and acquisition analytics, according to the authors.

To accurately organize the tens of thousands of small startup businesses in the dataset, Shi and his co-authors first looked at how to find similarities between the companies. They analyzed the overlap in business areas, such as the geographic distance, the employees, and the investors between two companies.

“How can we measure the business similarity of two startups?” Shi asks.

The researchers gathered data between April 2013 and April 2015. The companies and their information were collected at the beginning of the period and limited to U.S.-based companies, excluding those missing basic information, such as founding date and business description. Their dataset contained 24,382 companies, many of which were privately held and early-stage startups.

To uncover topics in the business descriptions, the researchers used a text-mining technique called topic modeling, a statistical method that discovers abstract “topics” from an extensive collection of documents. Then they took that text, converted it to a vector of numbers, and compared the two vectors to get a similarity measure, Shi says.

“The challenge is how can we translate a piece of text into some numbers, and that goes to the core of our research,” he says.

To validate this method, Shi and his co-authors tracked three types of inter-firm interactions: mergers and acquisitions (M&As), where one firm acquires another; investments, where one firm invests in another; and job mobility, where an employee changes jobs from one company to another. Their dataset included a total of 1,689 M&As since 2008.

In their research paper, Shi and his coauthors state, “Our research shows how big data analytics can potentially transform competitive intelligence, particularly for the high-tech industry, where recent years have seen an ‘entrepreneurial boom’ characterized by the explosion of digital startups. Such explosion has made it ever more difficult to purely rely on individuals’ industry knowledge to depict the rapidly changing landscape of the startup world. Our empirical analysis demonstrates the potential of extracting economically meaningful information from publicly available, unstructured data through large-scale computation as well as the value of the proposed business proximity measure as an important metric in the analytics of M&A matching and as a search tool for navigating the networked startup world.”

Shi, who teaches master’s courses in the business analytics program at the W. P. Carey School of Business, will turn his sights to the crowded mobile app market on the Apple platform for his next research endeavor.

“I’m excited about the new data science techniques, and I’m thinking of ways to leverage these tools to transform how we do certain things in economics research,” he says.