III: Medium: Collaborative Research: Collective Opinion Fraud Detection: Identifying and Integrating Cues from Language, Behavior, and Networks
Given user reviews on Web sites such as Yelp, Amazon, and TripAdvisor, which ones should one trust? Online reviews have become an important resource for public opinion sharing. They influence our decisions over an extremely wide spectrum of daily and professional activities: e.g., where to eat, where to stay, which products to purchase, which doctors to see, which books to read, which universities to attend, and so on. However, the credibility and trustworthiness of online reviews are at stake. It is well known that a large body of reviews is fabricated — either by owners, competitors, or entities paid by those — to create false perception on the actual quality of the products and services. What is more, opinion fraud is prevalent; while credit card fraud is as rare as 0.2% or less, it is estimated that 20-30% of the reviews on well-known service sites could be fake. This poses a serious risk to businesses and the public, from investing on a low-quality product to consulting an incompetent doctor for diagnosis and treatment. Like other kinds of fraud, opinion fraud is a serious legal offense. In fact, it is currently being recognized as a serious issue in law enforcement by policymakers. Thus solving this problem is of great importance to businesses and the general public alike. Accurately spotting opinion fraud will enable site owners to provide trustworthy content, maintain the integrity of their service, and protect the online citizens from unfair (or potentially harmful) products and services. Businesses will also benefit from reviews with reliable feedback. Honest businesses will be indirectly rewarded, as it will no longer be easy for unscrupulous businesses to benefit from fake reviews. The research outcomes will thus contribute significantly to the healthy growth of the Internet commerce. Educational activities include incorporating research findings in graduate level courses, educating public on fraudulent behavior and misinformation, and providing publicly available educational materials including lectures and manuscripts.
Given the critical issues of opinion fraud in online communities, how can one identify fake reviews and attribute responsible culprits behind them? By conjoining expertise of the PIs over various modalities of deception footprints ranging over language, user behavior, and relational information, this project presents a research program that will result in much needed solutions to this emergent, prevalent, and socially impactful problem. The ultimate goal is to create a unified detection framework via synergistic integration of multiple information sources; from linguistics, user behavior, and network effects, to obtain the best of all worlds. The main idea is to formulate the problem as a relational inference task on composite heterogeneous networks, providing a principled, extensible approach that can blend and reinforce all the above cues towards effective and robust detection of fraud. From a scientific point of view, the research brings together three disciplines: natural language analysis, behavioral modeling, and graph mining. The outcome is a suite of novel, principled, and scalable techniques and models that will enhance our understanding of the creation and dissemination of opinion fraud and misinformation in general at a large scale. The PIs will collaborate with industry partners such as Yelp, Google, and Amazon, directly solicit online fake reviews, and conduct well-designed user studies for testing and validation of their techniques. The project web site (http://www.cs.stonybrook.edu/~leman/PROJECTS/OPINION_FRAUD/) provides additional information and will include open-source software and datasets.