Workshop Date: Jan 07, 2024
Location: WAIKOLOA, HAWAII
Held in conjunction with WACV2024
Welcome to the 3rd Workshop on Image/Video/Audio Quality in Computer Vision and Generative AI!
Many machine learning tasks and computer vision algorithms are susceptible to image/video/audio quality artifacts. Nonetheless, most visual learning and vision systems assume high-quality image/video/audio as input. In reality, noises and distortions are common in image/video/audio capturing and acquisition process. Oftentimes, artifacts can be introduced in the video compression, transcoding, transmission, decoding, and/or rendering process. All of these quality issues play a critical role on the performance of learning algorithms, systems and applications, therefore could directly impact the customer experience.
This workshop addresses topics related to image/video/audio quality in machine learning, computer vision, and generative AI. The topics include, but are not limited to:
Time | Event |
---|---|
8:30-8:35am | Opening remarks (5mins) |
8:35-9:25am | Keynote 1 (50mins) Keynote Speaker: Ivan V. Bajić, "Visual Coding for Humans and Machines" |
9:25-9:35am | Break (10mins) |
9:35-10:35am | Oral presentation 1 (60mins) • Paper 2 (Long Presentation): Enhancing Surveillance Camera FOV Quality via Semantic Line Detection and Classification with Deep Hough Transform • Paper 6 (Long Presentation): AutoCaCoNet: Automatic Cartoon Colorization Network using self-attention GAN, segmentation, and color correction • Paper 22 (Short Presentation): HIDRO-VQA: High Dynamic Range Oracle for Video Quality Assessment • Paper 4 (Short Presentation, virtual): A Diffusion-based Method for Multi-turn Compositional Image Generation • Paper 5 (Short Presentation, virtual): Perceptual Synchronization Scoring of Dubbed Content using Phoneme-Viseme Agreement |
10:35-10:45am | Break (10mins) |
10:45am-11:35am | Keynote 2 (50mins) Keynote Speaker: Kevin Bowyer, "Gray Face: Could Grayscale Be Better Than Color for Face Recognition?" |
11:35-11:45am | Break (10mins) |
11:45-12:05pm | Lessons learned from challenge and invited talk from participant (20mins) |
12:00-1:00pm | Lunch break |
1:00-1:50pm | Keynote 3 (50 mins) Keynote Speaker: Andrew Segall, "Recent Advances in Deep Learning for Video Compression" |
1:50-2:00pm | Break (10mins) |
2:00-2:50pm | Oral presentation 2 (50mins) • Paper 7 (Long Presentation): Impact of Blur and Resolution on Demographic Disparities in 1-to-Many Facial Identification • Paper 11 (Long Presentation): ARNIQA: Learning Distortion Manifold for Image Quality Assessment • Paper 1 (Short Presentation): Noise-free audio signal processing in noisy environment: a hardware and algorithm solution • Paper 9 (Short Presentation): RealPixVSR: Pixel-Level Visual Representation Informed Super-Resolution of Real-World Videos |
2:50-3:00pm | Break (10mins) |
3:00-3:40 |
Oral presentation 3 (40mins) • Paper 12 (Long Presentation): DeepLIR: Attention-based approach for Mask-Based Lensless Image • Paper 21 (Long Presentation): Super Efficient Neural Network for Compression Artifacts Reduction and Super Resolution • Paper 16 (Short Presentation, virtual): Consolidating separate degradations model via weights fusion and distillation |
3:40-3:50pm | Break (10mins) |
3:50-4:20pm | Oral presentation 4 (30mins) • Paper 17 (Short Presentation): A Lightweight Generalizable Evaluation and Enhancement Framework for Generative Models and Generated Samples • Paper 24 (Short Presentation, virtual): Generating Point Cloud Augmentations via Class-Conditioned Diffusion Model • Paper 27 (Short Presentation, virtual): Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video |
4:20-4:25pm | Closing remarks (5mins) |
Zoom Information for virtual presentations: |
Topic: VAQ Workshop - WACV2024 Time: Jan 7, 2024 08:15 AM Hawaii Join Zoom Meeting https://us06web.zoom.us/j/83363653781?pwd=bqXbapbhmjfwSI104k4GAf4sa9YS9X.1 ID: 833 6365 3781 Passcode: 500582 --- tap mobile +12532158782,,83363653781#,,,,*500582# US (Tacoma) +12532050468,,83363653781#,,,,*500582# US --- Dial by your location • +1 253 215 8782 US (Tacoma) • +1 253 205 0468 US • +1 719 359 4580 US • +1 346 248 7799 US (Houston) • +1 669 444 9171 US • +1 669 900 6833 US (San Jose) • +1 929 436 2866 US (New York) • +1 301 715 8592 US (Washington DC) • +1 305 224 1968 US • +1 309 205 3325 US • +1 312 626 6799 US (Chicago) • +1 360 209 5623 US • +1 386 347 5053 US • +1 507 473 4847 US • +1 564 217 2000 US • +1 646 931 3860 US • +1 689 278 1000 US Meeting ID: 833 6365 3781 Passcode: 500582 Find your local number: https://us06web.zoom.us/u/kd6Xbti4fC |
Abstract: Visual content is increasingly being used for more than human viewing. For example, traffic video is automatically analyzed to count vehicles, detect traffic violations, estimate traffic intensity, and recognize license plates; images uploaded to social media are automatically analyzed to detect and recognize people, organize images into thematic collections, and so on; visual sensors on autonomous vehicles analyze captured signals to help the vehicle navigate, avoid obstacles, collisions, and optimize their movement. The above applications require continuous machine-based analysis of visual signals, with only occasional human viewing, which necessitates rethinking the traditional approaches for image and video compression. This talk is about coding visual information in ways that enable efficient usage by machine learning models, in addition to human viewing. We will touch upon recent rate-distortion results in this field, describe several designs for human-machine image and video coding, and briefly review related standardization efforts.
Bio: Ivan V. Bajić is a Professor of Engineering Science and co-director of the Multimedia Lab at Simon Fraser University, Canada. His research interests include signal processing and machine learning with applications to multimedia processing, compression, and collaborative intelligence. His group’s work has received several research awards, including the 2023 TCSVT Best Paper Award, conference paper awards at ICME 2012, ICIP 2019, MMSP 2022, and ISCAS 2023, and other recognitions (e.g., paper award finalist, top n%) at Asilomar, ICIP, ICME, ISBI, and CVPR. Ivan has served on the organizing and/or program committees of the main conferences in his field, and has received several awards in these roles, including Outstanding Reviewer Award (six times), Outstanding Area Chair Award, and Outstanding Service Award. He was on the editorial boards of the IEEE Transactions on Multimedia and IEEE Signal Processing Magazine, and is currently a Senior Area Editor of the IEEE Signal Processing Letters.
Abstract: Informally, color images may be perceived as higher quality than grayscale. All major face image datasets used by the research community contain RGB color images, and all deep CNN face matchers process color images. But, do CNN face matchers that work with color face images achieve better accuracy than equivalent matchers working with equivalent grayscale images? This talk examines this question in detail, and concludes that grayscale could in fact be better than color for deep CNN face matchers.
Bio: https://engineering.nd.edu/faculty/kevin-bowyer/
Abstract: The field of image and video quality is rapidly advancing due to improvements in computer vision, machine learning and neural network design. However, in many cases, these quality methods are used to evaluate data that has been compressed. And the compression of these images and video is also advancing due to the same improvements in computer vision, machine learning, and neural networks. In this talk, we survey recent developments in image and video coding with an emphasis on the use of deep learning. Both end-to-end solutions as well as enhancements to existing systems are included. Additionally, recent efforts to create data sets sampling these methods are introduced.
Bio: Andrew Segall is currently the Head of Video Coding Standards at Amazon Prime Video. Previously, he was a Director at Sharp Labs of America, where he led the Department of Systems, Algorithms and Services while simultaneously holding the position of Distinguished Scientist at Sharp Corporation. He is an active participant in the international standardization community and has developed and contributed technology to the Versatile Video Coding (VVC), High Efficiency Video Coding (HEVC), Advanced Video Coding (H.264/AVC), and ATSC 3.0 projects. He currently serves as co-chair of the Neural Network Video Coding activity in the Joint Video Experts Team (JVET) of ITU-T SG16 Question 6 and ISO/IEC JTC1/SC29/WG5, HDR Chair for the MPEG Visual Quality Assessment Advisory Group (ISO/IEC JTC1/SC29/AG5), and represents Amazon on the Alliance for Open Media (AOM) Steering Committee. He received his B.S. and M.S. degrees in electrical engineering from Oklahoma State University, and his Ph.D. degree in electrical engineering from Northwestern University.
Alongside the workshop we are hosting a Grand Challenge on the identification of audiovisual synchronization (AV-Sync) errors. An AV-Sync error, the temporal misalignment of audio relative to the video, is a defect that can have a large impact on a viewer’s Quality of Experience. The Telecommunications Union’s Rec. ITU-T J.248: Requirements for operational monitoring of video-to-audio delay in the distribution of television programs (2008) used subjective experiments to determine participants could detect an AV-Sync error if the audio leads the video by greater than 45 ms and if the audio lags the video by greater than 125 ms.
The Grand Challenge will consist of two tasks:
The two tasks will have a combined prize fund of $12000. We invite participants to enter one or both of the tasks at the links above with the following timelines:
Amazon
Amazon
Amazon
Amazon
Amazon
TBD
If you have any questions or inquiries, please contact us at wacv2024-ws-iva-quality@amazon.com.