Goal: Increase user satisfaction with the retrieved results when searching for similar images through an image database.
Problem: Images with very similar visual elements were separated because of the different textual elements.
Challenges
Machine Learning and Predictive Analytics:
The texts present in images were very different in terms of fonts, the text being sometimes a stylized representation inside the image.
The images themselves were different, ranging from photos to art, sketches or branding elements (book covers, packaging).
Insufficient data required thinking of a way to artificially generate data pairs, the same image with text and without text.
Algorithms
Machine Learning and Predictive Analytics:
The suggested approach consisted of a sequential pipeline: text segmentation followed by image inpainting.
For the text segmentation a neural network with a transformer like architecture was trained on multiple fonts.
The second step was to color the pixels which were previously text, such that the image was visually consistent. Different inpainting techniques were tested, but the most visually appealing results were obtained by using a Generative Adversarial Network (GAN) in which a generator network tries to fill-in the missing parts in the image and a discriminator network judges how realistic the result looks like, by comparing it to the true image.
Results
Integrating this feature into the image search engine led to higher satisfaction on the users’ side.
Impact
Improve the patient experience by reducing the diagnostic waiting times.
Streamline the radiologist’s workflow, empower novice radiologists in diagnosing and improve their ramp-up process.