Shiping Tang: China should enhance decision-making with Computational Social Science
The senior Chinese social scientist is running for Vice-President at the International Studies Association (ISA).
Today, we are sharing a condensed version of a paper by Shiping Tang from Fudan University, who is currently a candidate for the position of Vice-President at the International Studies Association (ISA). The paper discusses the application of Computational Social Science in enhancing China's decision-making processes. You can find the full paper at this link.
Shiping Tang is Distinguished Professor & Dr. Seaker Chan Chair Professor at the School of International Relations and Public Affairs (SIRPA), Fudan University.
Prof. Tang is widely recognized as one of the most influential Chinese social scientists on the global stage. Despite not being a native English speaker and primarily working outside of North America and Europe, Prof. Tang has authored five single-authored books in English, published by leading academic and commercial presses:
The Institutional Foundation of Economic Development. Princeton University Press, 2022.
On Social Evolution: Phenomenon and Paradigm. Routledge/Taylor & Francis, 2020.
Social Evolution of International Politics. Oxford University Press, 2013. Winner, “Annual Best Book Award,” International Studies Association, 2015.
A General Theory of Institutional Change. Routledge/Taylor & Francis, 2011.
A Theory of Security Strategy for Our Time: Defensive Realism. Palgrave-Macmillan, 2010.
He was the first Chinese and non-Western scholar to win a major book award in international relations:
Shiping Tang, 2013, The Social Evolution of International Politics. Oxford University Press. “Annual Best Book Award”, International Studies Association, 2015.
and the first Chinese scholar to join the editorial board of leading journals in international relations:
International Security, 2021-
International Studies Quarterly, 2015-2020
Security Studies, 2015-
International Relations, 2021-
Vote for Shiping Tang at ISA's official link.
The Future of Computational Social Science and Scientific Decision-Making
计算社会科学与科学决策的未来
Generally speaking, decision-makers grapple with two core challenges: a scarcity of information and the capacity to process available information, which includes sorting and discarding irrelevant data. Historically, these issues have been nearly insurmountable, leading to a heavy reliance on expert judgment for complex decision-making. However, advancements in natural sciences and technology, notably in mathematics (with an emphasis on probability theory and statistics) and computer science, have progressively enabled decision sciences to harness these technological advancements. The emergence of Computational Social Science (CSS) promises to significantly mitigate these long-standing challenges. This development heralds a new era of transformative possibilities in the realm of scientific decision-making.
The article addresses two questions: First, in the field of decision CSS, should the emphasis be on machine learning techniques that focus on imitation, or should the approach be predominantly simulation-based, supplemented by machine learning? Second, should decision CSS predominantly depend on big data, or should it integrate a diverse array of data types for a more comprehensive approach?
Imitation Or Simulation?
The article suggests that decision CSS embraces two key technical approaches: machine learning, which focuses on imitation, and computational simulations, which target simulation. Considering the unique requirements of decision-making issues, future decision CSS should principally leverage simulation methods, yet incorporate elements of imitation techniques.
Machine learning's attempt to predict social outcomes, like political instability, coups, revolutions, and civil wars, typically relies on historical and news data. However, these methods are generally viewed as ineffective in academia.
This ineffectiveness stems from three main reasons. Firstly, these studies often focus on structural factors, which evolve slowly. As a result, models based on these factors can only provide a broad sense of a country or region's stability but fail to offer precise and timely predictions. Secondly, social systems are dynamic and evolutionary, yet machine learning predictions often assume these systems to be relatively linear, leading to inaccuracies. Finally, efforts to predict social outcomes using deep learning involve a significant "black box" component, meaning they lack robust theoretical and empirical foundations from the social sciences.
The development of computers ushered in the era of computational simulations, which come in many forms. Given that social outcomes are forged by the behavior of agents and their interactions in specific environments, this article suggests that Agent-Based Modeling (ABM) is likely the most appropriate method for simulating emergent social outcomes driven by strategic behaviors.
One major limitation, though, is computational capacity. To simulate social outcomes that closely mirror real-life scenarios, a substantial amount of computing power is necessary. The surge in computational capabilities since the year 2000, particularly with the emergence of cloud computing, has significantly enhanced the potential of ABM. Today, ABM is applied to a wide range of issues, from terrorism to social mobility, economic and social dynamics, and complex social networks, including those related to terrorism and drug trafficking.
All Data, Not Just Big Data
For many, CSS is synonymous with big data-driven social science. However, this perspective isn't universally accepted among traditional social scientists, and this article also expresses reservations about equating the two.
This article suggests that "all data computation" is the right approach for decision CSS. It emphasizes an "all data" mindset, which goes beyond merely focusing on big data. Big data is part of the picture, but not the entirety; all data includes and transcends big data.
The "all data" mindset involves identifying the type of data required first and then using this data to address complex decision-making problems. This approach prioritizes the necessity and overall adequacy of data in solving a decision problem, instead of merely emphasizing the quantity or assuming that more dimensions and larger volumes of data are better. In essence, "all data" thinking starts from the problem needing resolution, rather than from the data itself. It can also be thought of as "sufficient data" thinking.
Therefore, the primary question for "all data" thinking is to determine what data is needed to solve a specific problem. Often, big data alone is insufficient to tackle complex decision-making issues; it needs to be combined with other fundamental demographic, economic, and political data. Solely relying on big data might suffice for minor decision problems, but not for more complex ones. Indeed, without focusing on the problem at hand, researchers might not collect certain data or even realize it exists and is collectible. Researchers must also be wary of relying heavily on big data in critical decision-making, as it is susceptible to contamination by false data and misinformation.
The "all data" mindset further addresses how different types of data can be utilized differently. For instance, macro and meso-level data are typically more useful for grasping the larger context, while accurate micro-level data, such as social media insights, hotel records, and phone numbers, can help in making precise predictions about specific individual and group behaviors. This indicates that each type of data serves distinct purposes and must be judiciously integrated to address varied problems.
In essence, when addressing specific and complex decision-making challenges, researchers require a mix of different data types, foundational data, and big data. Thus, the "all data" approach does not set a predetermined data scope for solving a particular complex decision issue; rather, it necessitates that researchers explore various combinations of data sources based on the specific research question at hand. Addressing complex decision-making problems requires a foundation in established social science theories, empirical studies, and data, along with an understanding of new big data sources and processing techniques. Only with this comprehensive approach can researchers effectively utilize the full spectrum of resources.
Two Cases of CSS
What decision-making challenges can CSS help address? This article proposes that CSS will be extremely valuable in decision-making for several scenarios: Firstly, it can forecast the general political trends of countries, particularly concerning domestic political stability. Secondly, for businesses, especially multinational corporations, it can predict the political and economic trajectories of countries they are considering investing in and offer insights for competitive strategy and site selection decisions. Thirdly, for individuals, it can aid in making decisions about high-stakes matters like travel and property investments.
Below are two specific examples at the national level to illustrate how computational social science can offer new approaches to scientific decision-making.
1. Election Prediction
Traditional election predictions mainly rely on opinion polls or processing data from various polls. Big data based on social media, in principle, is also a form of opinion polling. However, opinion polls (including big data from social media) suffer from four inherent biases that are hard to resolve: sampling bias (especially in cases requiring stratified sampling), self-selection bias among respondents, intentionally misleading responses from respondents, and the fact that individual support preferences may not necessarily translate into actual votes (due to issues like voter turnout and unforeseen events such as natural disasters or terrorist attacks).
For that reason, a research team at the Center for Complex Decision Analysis (CCDA), Fudan University, developed an election prediction platform based on ABM computer simulation that does not rely on pre-election public opinion polls at all.
Equally important, this prediction platform does not depend on big data from social media. Before the Cambridge Analytica scandal came to light, the research team was already cautious about using big data, especially from social media, to predict election outcomes. This is because big data is easily contaminated with fake data and misinformation, as demonstrated by events like Donald Trump's election and the UK's Brexit referendum. In other words, a significant amount of information on the Internet is fake news or deliberately misleading. Although some algorithms have been developed to identify and dilute the impact of fake news, the authenticity and reliability of data remain key factors affecting the accuracy of big data predictions.
Since 2016, the center has accurately predicted local leadership elections in the Taiwan region of China five times in a row (2016 and 2020), county and city elections in the Taiwan region of China (2018), the U.S. Senate elections (2018), and the 2020 U.S. presidential election in six states. The team has released relative vote share predictions rather than just who wins or loses. In the six predictions made so far, the difference (error) between the team's predicted vote shares and the actual election results has been as low as less than 1% and as high as only 6%. Furthermore, ABM-based election predictions can provide such forecasts several months in advance, which is impossible with traditional polling or social media-based predictions. Finally, ABM-based election simulation predictions can also provide an approximate range of the impact of certain unexpected events on the final election results.
In the latest prediction, the research team released predictions of vote share proportions for the U.S. presidential election in six states on November 1, 2020, at 12:00 AM Beijing time (October 31, 2020, at 12:00 PM Eastern Time in the United States). The final election results confirmed the success of this prediction. After multiple tests of election predictions, we have reason to believe that election predictions based on data modeling and large-scale computational simulations are not only feasible but also a more effective technical approach. Additionally, ABM simulation results can help us better understand the hidden dynamics of election behavior and election politics, thus advancing social science research.
2. Combating Drug Networks
The recent academic research on computational simulations of South American drug networks and drug trafficking to the United States has been highly informative. Since U.S. President Nixon declared the "War on Drugs" in 1971, the United States has continuously strived to disrupt or significantly weaken the trafficking networks connecting South American source regions to the U.S. market. However, despite decades of efforts, the effectiveness of the "War on Drugs" has been far from satisfactory. In fact, despite substantial investments in terms of manpower, resources, and finances to combat drug smuggling, the retail prices of drugs in the United States have consistently decreased (indicating a steady growth in drug supply). Furthermore, the geographic area of drug trafficking in the Western Hemisphere expanded from over 2 million square miles in 1996 to over 7 million square miles in 2017.
To understand the reasons behind the underwhelming outcomes of the "War on Drugs," research teams from various U.S. universities have collaboratively developed an ABM simulation platform called "NarcoLogic." This platform is based on geographical information systems and integrates complex social networks to simulate how drug traffickers make decisions in response to disruption attempts over time and space. It aims to explore the root causes of the ineffective disruption by examining the interactions between these decisions and U.S. enforcement measures.
The simulation platform incorporates multiple theoretical perspectives, empirical studies, media reports, and extensive on-site research conducted by scholars in the region. Parameters and validation processes are based on the most comprehensive and authoritative illegal cocaine flow data available. The simulation reproduces the dynamic changes in the ongoing "cat-and-mouse" game between drug traffickers and law enforcement agencies across time and space. Through visualizations provided by the simulation, the platform effectively illustrates the structure and functioning of drug trafficking networks, showcasing the emerging outcomes resulting from the strategic interactions between drug traffickers and law enforcement agencies.
The platform comprises three types of agents: drug trafficking networks (specifically South American drug cartels), transport networks (entities involved in drug wholesale and retail, which also form small-scale networks but can cooperate with different drug cartels), and interceptors (particularly the U.S. Drug Enforcement Administration). Each agent possesses specific characteristics or attributes that can be defined using data. Specifically, drug trafficking networks have 8 attributes, transport networks have 15 attributes, and intercepting agents have 4 attributes. The environment within this simulation platform has 9 different attributes. The behavior of each agent in the system is influenced by the behavior of other agents and environmental factors, and these interactions are characterized by four straightforward equations.
The platform simulates two primary scenarios. One scenario involves different drug trafficking networks and transport networks closely monitoring each other, infiltrating one another, and even engaging in communication. In other words, drug trafficking networks and transport networks spanning the Americas make decisions and take actions within a larger network. The other major scenario entails different drug trafficking networks and transport networks acting independently. Upon comparison, researchers found that the model based on network behaviors yielded superior results. This suggests that not only are individual drug smuggling groups social networks in themselves, but the entire drug trafficking system across the Americas constitutes a large-scale network. The underlying rationale is that drug cartels seek to balance risks and rewards from a top-down perspective, and both they and the transport networks adopt a global outlook, considering actions beyond just the local level. Another significant insight from this simulation is that, in an effort to evade law enforcement, drug smuggling groups tend to disperse their smuggling territories and employ more flexible and aggressive smuggling methods over time.
Leveraging ABM simulation, NarcoLogic can assist the U.S. Drug Enforcement Administration in conducting comprehensive assessments of different drug policy scenarios and their potential impacts on trafficker behavior. It also allows for an evaluation of the numerous collateral consequences associated with the militarization of the War on Drugs. Furthermore, this research has revealed through simulation that different drug trafficking groups are not isolated entities but rather competitive and collaborative entities within a vast network. Therefore, U.S. drug enforcement efforts must consider both local and global perspectives to effectively counter drug trafficking.
Lessons for China
Traditional decision consulting, primarily dependent on expert interpretation and judgment, is a product of the pre-information revolution era. Due to the lack of data and sufficient computing power, decision consulting often had to rely on expert judgment. This traditional approach to decision consulting is almost ineffective in helping modern states and businesses cope with highly complex and rapidly changing environments. Therefore, with the significant enhancement in data collection and processing capabilities, major countries around the world are investing substantial resources in building decision-making consulting systems based on data and computation, or CSS, to effectively deal with these complex and rapidly changing environments.
In the field of strategic consulting based on CSS, the United States holds a leading position. Even before the advent of CSS, the U.S. core decision support systems and some crucial sectors had already entered the computational era. For instance, the RAND Corporation, initially supported by the U.S. Air Force, has always been developing computer-based decision support systems. In fact, the world's first artificial intelligence project, "The Logic Theorist," was developed with the support of the RAND Corporation.
Additionally, since the 1960s and 1970s, the U.S. military, through the Defense Advanced Research Projects Agency (DARPA) under the Department of Defense, has been continuously supporting this type of research and development. After decades of accumulation and exploration, the U.S. Department of Defense introduced a new generation of integrated crisis early warning systems based on CSS, known as the Integrated Crisis Early Warning System (ICEWS), after 2000. Besides the U.S. intelligence system, universities involved in the development of this system include Harvard University, Pennsylvania State University, the University of Maryland, and George Mason University. Although the Integrated Crisis Early Warning System still requires improvements, it has played a significant role in the U.S. response to potential crises, counter-terrorism, and support for military operations in Afghanistan and Iraq. The U.S. military has also supported much of the research proposed by CSS pioneer Lazer. Clearly, the U.S. military has always been very focused on the development in this area.
Another research and development initiative supported by the U.S. government is the "Project of State Reconstruction and Stabilization" (PSRS), originally housed within the Office of the Coordinator for Reconstruction and Stabilization (S/CRS) under the U.S. State Department. In 2011, this office evolved into the Bureau of Conflict and Stabilization Operations (BCSO). The BCSO's PSRS project is primarily aimed at the reconstruction and stabilization of certain U.S. allies or nations that may have experienced turmoil, possibly as a result of U.S. intervention. Key institutions contributing to this project include Stanford University, the University of California, San Diego, and the Brookings Institution. The insights generated by this project have been instrumental in guiding U.S. efforts in rebuilding and stabilizing Iraq, Afghanistan, and several African nations. Beyond the United States, several other developed countries are also pursuing similar research and development endeavors.
In summary, information collection and processing are essential functions of any decision support system. For China, the current system has basic data-gathering abilities and relatively elementary information-processing capacities. It significantly lacks comprehensive computational support and is far from conducting complex computational simulations. Hence, there is an urgent need for China to develop a strategic consulting system rooted in CSS. This requires the creation of a technological platform that combines social science, data technology, computer simulation, machine learning, and artificial intelligence. Such a platform would not only be a key technology but also a vital component of the nation's core capabilities.
The specific measures should include at least the following six aspects:
National Recognition of CSS: The state must profoundly recognize the extensive and far-reaching impact of CSS on strategic decision-making from a perspective of long-term stability and governance. Just like high-performance chips, decision CSS is one of the key dimensions of national hard power and must be given utmost importance.
Scientific Enhancement in Decision Consulting: Gradually increase the scientific requirements for national decision consulting. From the demand side, enhance the need for decision consulting based on CSS, and progressively increase the proportion of consulting reports that rely on CSS for deduction and judgment.
Increased National Investment: Given its strategic importance, there should be an increase in national investment in the research of CSS, especially in research and development aimed at prediction and deduction.
Talent Training and Academic Team Building: Encourage interdisciplinary academic training systems and collaborative platforms. CSS should be quickly established as an interdisciplinary subject or specialty linked with academic degrees.
Data and Algorithm Sharing: The development of CSS depends heavily on large-scale data and the sharing of data and algorithms. Currently, large-scale data holders are mainly enterprises and governments. The state should promptly require enterprises to share non-private user data (such as travel and consumption data with personal information removed) and encourage different government agencies and research institutions to establish shared platforms for data, algorithms, and models.
Encouragement of Private Enterprise Investment: Encourage relevant private enterprises to increase their research and development investments and strengthen university-enterprise cooperation.
China has a long way to go in the realm of CSS, yet it stands among the few economies with the requisite human, technological, and financial resources to robustly develop a strategic consulting system underpinned by this discipline. Achieving this goal demands more than just data technology and computational prowess; it also requires drawing upon the essential theories and empirical results from established social sciences. This means effectively merging social science with computer and data technologies. Consequently, integrating CSS with decision sciences isn't about replacing expert knowledge; rather, the expertise of social scientists forms a crucial foundation for the advancement of CSS. Collaboration across disciplines—social scientists teaming up with computer scientists, data scientists, and working in tandem with government and business sectors—is key to enhancing the scientification of China’s decision-making processes and advancing the broader field of decision sciences.
Again, vote for Shiping Tang at ISA's official link.