No way I'd do the whole season like that though. Turned out pretty great, but took an hour or 2. So I made English subtitles for E01, by machine translating existing German and Portuguese subs and using the 'original' German for reference. No English subtitles available for S04 of Rita (danish). Just in the last few days I've had these subtitle-related issues: I wish there was a site where you could put money towards english subs for things!! I guess judging the quality wouldn't be so easy etc. BBC or SBS shows non-english movies/series I guess. ![]() I imagine doing it will also get you really close to understanding how the movie is built up. the viewers may accept someone saying "fuck that", but seeing it written down "- Fuck that" feels awkward.Īnd that's probably also why people are doing it in their spare time, perhaps after having seen a bad job from the official movie. And there's also the issue of difference between spoken and written language, e.g. you can't translate a strong exclamation word into something too soft. Imagine also facing translation issues - where you not only are trying to roughly hit the same meaning and connotations, but also need to stay relatively close to the acting, e.g. Now what do you cut?Īnd this is just for subtitles in the same language. Sometimes, if it's a quick back and forth, you even have to leave out some of the dialogue. There's a limited space and time budget - people need to be able to read the thing without spending too much time focusing on it. Try writing subtitles yourself, and you'll probably see why. > Both are in English, so I don't see why they can't just transcribe what's being said. In many ways, correlation and covariance are more fundamental than convolution - they are closely and directly related to the inner product.Īnd the only reason to use the FFT here is convenience (i.e., it is fast and simple to compute), but the correlation/convolution property applies to many "transform domains" - Laplace, Z, Cosine, Sine, and a few others (all of which are closely related among themselves, but only a few easy to compute numerically). (I don't think 5 year olds heard of Pearson's correlation coefficient or convolutions, but. Which means, if you have an efficient way to compute convolutions (and you do, through FFT), you have an efficient way to compute correlations. Now, correlation is just like convolution except one series is flipped around on the time axis. What he is looking for is maximal correlation between two binary series (think "Pearson's r or r-square correlation coefficient"). It has nothing to do directly with FFT, which is why it is confusing: So finally to solve the problem, we get the pointwise product sequence S = fourier(videoSpeech) \* fourier(subtitle), do the inverse fourier transform on it invFourier(S), and maximize invFourier(S)(i) over i. ![]() It's relevant here because it "turns convolution into multiplication": fourier(videoSpeech)(i) \* fourier(subtitle)(i) = fourier(conv(videoSpeech, subtitle))(i). The discrete fourier transform is a function that takes a sequence and gives another sequence. So we can rephrase this problem as, find the index i which maximizes the value of the convolution sequence. Mathematically, conv(videoSpeech, subtitle)(i) = sum( videoSpeech(t)\* subtitle(t-i)). This is the definition of a convolution: the convolution of two sequences gives a new sequence where the value at index i is this sum above where one of the sequences is shifted by i. We can express the goodness of alignment as the number of matching 1's, aka the sum over all times t of videoSpeech(t) \* subtitle(t). ![]() We can imagine padding the sequences out on either side with 0's, and consider the alignment task as shifting the subtitle sequence left and right in time until we get the best alignment with the 1's in the videoSpeech sequence. We have the videoSpeech sequence, and the subtitle sequence - each is a vector, indexed by time, of 0's and 1's indicating whether there is speech in that time. Lets start by seeing how this is a convolution. (note that \* means multiplication, there doesnt seem to be a way to escape an asterisk) ELI5: You can turn this problem into finding the best "convolution index", and fourier transforms make computing convolutions cheaper.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |