Temporal video segmentation with natural language using text–video cross attention and Bayesian order-priors | Publicación