Paper Review #3 - How Fair Can We Go: Detecting the Boundaries of Fairness Optimization in IR

by. Jongwon Lee | 132 Views (113 Uniq Views) | about 3 years ago

#PaperReview #InformationRetrieval #MachineLearning

Paper Review of How Fair Can We Go: Detecting the Boundaries of Fairness Optimization in Information Retrieval (Gao, et al., 2019)

1. Background
Gao et al. state that even with various optimization algorithms along with theoretical analysis, in real-world applications, the performance of those optimization algorithms often heavily depends on the data. In this work, Gao et al. propose a framework that offers a perspective on optimization with fairness constraints. Their framework aims to estimate the solution space which draws all possible utility values and degrees of fairness. The framework let researchers depict each optimization policy in this space, and analyze what solutions can be achieved with each policy. Once the solution space is defined, the limitations of each policy in terms of optimal values are fixed. Moreover, their framework can be used across various types of IR systems not limited to certain data. Also, the framework captures the influence of a heavily utility-focused system.

2. Research Questions

What is the solution space characterized by the given data?
What effect does introducing fairness bring to the system?
Can we identify solution space to help us trade-off different optimization policies and guide us to pick suitable algorithms and/or make adjustments on data?
What does it mean if we cannot achieve optimal fairness on the given dataset? What are the implications of the solution space in this scenario?

3. Data and Methods

Synthetic dataset: Bernoulli Random Variables (100 random points)
Realworld dataset: YOW RSS feeds datbase - collection of 21 users’ feedback on RSS news feed

4. Findings
Gao et al. proposed a framework that analyzes the fairness problems in IR. The framework depicts the solution space by estimating theoretical boundaries and optimal solution values. The solution space makes analysis in certain datasets feasible which would otherwise be difficult. With this framework, researchers can plugin customized utility functions and fairness constraints into the framework and apply them to any dataset. Gao et al. first depict and explain solution space for both synthetic and real-world data. Then, they answer the usage and impact of their framework on IR problems. According to Gao et al., many previous works have shown that the increase in the amount of fairness negatively affects performance in terms of retrieval accuracy, relevance score, user satisfaction on personalization and recommendation satisfaction, etc. However, experiments reveal that is not necessarily true for some datasets. With a dataset that tends to present good fairness, we can achieve optimal fairness and maximize relevance at the same with the minimum trade-off. The solution space framework can visualize this idea.

Gao et al. categorize fairness in IR optimization problems as follows:

optimize the utility, subject to a set of fairness constraints
optimize fairness while guaranteeing a lower bound on the utility
jointly optimize for both utility and fairness to achieve an overall satisfaction

Experiments show that if the solution space is concentrated around high relevance and high fairness, all three types of policy are applicable. If the majority of solution sets are of high relevance and low fairness, it is better to optimize for fairness while subject to a high relevance constraint. If the majority of solutions are of high fairness and low relevance, it is better to optimize relevance while setting a high fairness constraint. In the case where solution space is concentrated around both low relevance and fairness, the policy that jointly optimizes for both dimensions is more practical. If the solution space shows that the solution set with optimal fairness does not exist, then one may need to consider the bias in the data collection and/or review algorithms. Thus, this framework is also useful to know if the researcher needs to examine the dataset.