Actually I haven't read all of what you posted, but I've worked for quite a while with this stuff and I can share some thoughts with you.
Most likely facial recognition software uses a mix between real time classification (individual features like eyes, mouths etc.) and semantic segmentation (tracing an entire object as it moves through frames of a video like a CCTV feed as its own entity, i.e. figuring out two different people who are walking side by side) .
The way these systems are trained (assuming they're Convolutional Neural Networks) is by
>taking a huge amount of input data (generally speaking, big govts use social media apps to grab as many photos of people's faces, then a good chunk of them are analyzed and their overall class and/or features are marked by hand by a human specialist),
>making the filters of said convnet learn the basic features of the pictures (to make it as babby simple as possible, think of a table: the first layer of filters analyzes all areas of high contrast in a picture, like sudden horizontal and vertical changes, which correspond to the base and the legs of a table; the second layer MIXES said basic filter outputs, i.e. it can find the CORNERS of the table; the third layer might find rounder creases on the wooden fillings of the table and so on and so forth), and
>tuning the non-trainable parameters of the model to better fit to the problem at hand, which is to say, trying to fix eventual errors (or False Positives/Negatives for a given class or segment).
Now, why did I have to type all this technological mumbo jumbo? Because at its core, the way these surveillance systems work is by being slightly more advanced neural networks with a human overseer to spot non-trivial features (for instance a human is needed to be able to tell a picture on an advertisement poster compared to a real life human posing near it). One way to trick the artificial aspect of the systems is, you guessed it, creating artificial contrast. Anti-surveillance makeup you see popping up from time to time in cyberpunk flicks are meant to do just that (usually they're straight black lines over the most important features like mouths or eyes). A more sophisticated means to achieve the same effect is to paint extra features like ears or eyes or the back of a person onto the target - the system tries real hard to find items that match certain features, fails and scores your features on low probabilities, meaning that you may still be found but they can't quite get your picture right. What you see here >>180
is also pretty ingenious, as these people have basically created pictures whose features blend in with each other, meaning that the convnets can still find them, but the data on them is so scarce that it misassigns a class to them.
Now, convnets themselves are piss easy to build. All you need to do is have a pc with an off-the-shelf consumer gpu, python with tensorflow-gpu / keras installed, a bucketload of faces or people wearing teeshirts (super easy to find, there's a ton of Kaggle tournaments for both classification and semantic segmentation, one of them is bound to be about either) and a couple of days to train them properly. You instantiate either a custom made or a pre-built convnet, train it appropriately on the dataset following some tutorials, then test it out on pictures you may find on the internet. If the probability of that picture containing faces of a certain kind (i.e. side faces vs faces looking straight on) is high enough, the system will mark it as such. However, what you can do at that point is manually edit the picture to add or remove features and test it out on the same convnet, over and over, until you get real low levels of accuracy. That's when you know that your design or tatoos can easily fuck up a basic system.
If you wanna learn more on this, I strongly suggest you study some Image Processing. It's eye opening how stupid the procedure is but how deep it affects our everyday lives.