Machine learning and Morph Identification - It actually kinda works

chesterhf · October 28, 2020, 7:46pm

So I’ve been working on building a machine learning program to identify morphs just for fun, and to my surprise it’s actually not half bad. It’s been trained on ~350 images so far using only a handful of basic morphs (Normal, Spider, Mojave, Leopard, Butter/lesser and Clown) and is currently running at a combined accuracy of 96%.

Current guessing accuracy:

Clown - 98%
Leopard - 95%
Normal - 100%
Mojave - 91%
Lesser/Butter - 91%
Spider - 98%

Made into an app, this would actually be pretty useful for all the “I bought this at partsmart as a fancy ball python, what morph is it?” situations, or for picking out morphs from complex combos.

Some major observations:

It will never be able to predict new combos and can only be trained on morphs that already exist.
It’s excessively time consuming and takes some computing power
There’s an incredible amount of variation even within single gene morphs or normal/WT
We do’t realize just how good the human eye is at picking out patterns, I don’t know that a program is capable of doing better than an experienced/well trained human

tl;dr: machine learning can identify some morphs but it’s usefulness is very limited

stackedbp · October 28, 2020, 8:16pm

Bump that up to 350,000 images and it might get pretty good. Totally do an app, even at current accuracy it would get downloads for novelty - and users could build the database. Definitely worth $0.99.

erie-herps · October 28, 2020, 8:37pm

Do you have any plans what you might want to do with it, whether application or advancing the program making it a wider range of different morphs or making it more accurate? Adding onto what @stackedbp said, I also think it’d be a great app and good to have a user-built database. I think it would make a good freemium app, a free app but users can pay for the full package, eg. It’s free for a user and they can find out whether it’s a clown, normal, pastel, mojave. But they have to pay a price of like $5 or so if they want to have access to morphs like, fire, blade, enchi, pastel, etc. I think that if you did something with this program (like make it into a large popular app) you could add in different species: Leopard geckos, Boas, hognoses, etc. I think this could have a lot of potential if you did something with it and added a larger database of images that it learned from.
Do you think this could work from different viewpoints, out of the 350 pictures that you used so far are they all the same background, position, viewpoint, focus sharpness, etc. I think that if you did make an app out of it you could have it so you could either upload or take a picture and then show a list of possible morphs ranked by most likely and then the user can click on those morphs and see different variations and pictures of that morph and they can decide what morph it is, from there they can select that the selection was right and the program would take the photo that they uploaded and add it to the database under that morph. Under that list of the possible morphs if the user is still on the free software and the software recognizes the picture to be a higher level morph on the price package then it could show an empty slot where that morph would be and show that you have to have the upgraded version for it to show rarer morphs. And usually the rarer morphs part of it would entice users to buy the software. Overall I think this software could have a lot of opportunity seeing how unique it is and the demand for that kind of software.

stackedbp · October 28, 2020, 8:53pm

I use an app for plant ID that is kind of like what you’re talking about. I get 5 free ID’s a month, or I could pay and get unlimited plus access to more information, a higher end database. (You take a photo of a plant, it analyzes, then gives a % match to a couple of different plants. It gives you the option to add your photo to the database if you get something with an exceptionally high match %.)
A cool addition to a premium version would be current market value for the identified morph, or even market saturation.

erie-herps · October 28, 2020, 9:07pm

That would be cool with the added value and saturation, I think the main market for this would be pet owners so I don’t know if they’d really need multiple ids they might just get a bp, curious what morph it is, they find out and they’re done with it…until they catch the reptile bug and get a few more pythons, even then they’re probably not going to need more than 1 id every 2-3 months. And because it’ll probably be a pet owner they might like to know a little bit about the morph, what causes it, when it was discovered, any phenotype effects it has on the snake, price fluctuations throughout the years, etc. Also with what you’re saying about the % of plants, that could be useful for morphs too, for the free version it might just tell you that it has clown and mojave but for the paid version it might tell you het clown and mojave (Or some visual het or super form, since I don’t know much about bp genetics I can’t give an example).

chesterhf · October 28, 2020, 9:24pm

Oh absolutely, I’ve only been working on it for a few days so with more time and pictures I think it will definitely improve, it’s just going to take me a while to get there.

I would like to do both of these things, including more morphs and making it as accurate as possible!

I’ve actually been using mainly pictures from MM ads, so they are often on different backgrounds, include logos/other noise and include snakes from different angles, so it’s pretty good at identifying even if only half the snake is visible in the picture or it’s a little fuzzy. I’m definitely working on improving accuracy here, because it still struggles a bit with butter/lesser vs Mojave. All of the other morphs it’s pretty good at so far

I have very little computation biology/machine learning skills, and absolutely no app development or program development skills, so I haven’t really thought of what to do with it beyond just having it as a fun tool. But if it gets to be pretty good/useful I will definitely share to the community.

erie-herps · October 28, 2020, 9:29pm

How did you create the software? Did you code it, if so what language did you use?

chesterhf · October 28, 2020, 9:31pm

I actually didn’t really create anything, currently I’m using Microsoft Lobe which is pretty much drag and drop and fairly intuitive. Previously I was trying to use Orange and found it to be both difficult to use and not very accurate.

eaglereptiles · October 28, 2020, 9:44pm

Holy smokes Hilary, you actually built the darn thing .

A lot of people have been toying with this idea in their heads for a while but I don’t know of anyone that actually put the effort in .

How far have you pushed it so far in terms of stacked genes? Are we talking single genes or can it decipher 2/3/4 genes in one animal?

You would be extremely surprised. Dustin Sandlin recently worked on a simple device that attaches to preexisting CCTV cameras and can detect a firearm (and differentiate between a gun/phone/food/ stapler…) present in a room, from any angle whatsoever.
The program had a rediculous amount of photos of guns from films, TV, games, CCTV footage though (20,000+) but it worked. This is doable with today’s technology.

chesterhf · October 28, 2020, 9:47pm

So far it’s only doing single genes, I haven’t figured out how to do stacked genes/combos yet. I feel like I’d have to train those in separately, for example if it can detect leopard or banana on it’s own, it won’t be able to call a banana leopard without being trained on banana leopards.

chesterhf · October 28, 2020, 9:52pm

Also, if anyone wants to post a picture of one of their snakes with one of the above listed genes without saying what it is, I’ll test it out and see if it can guess

I’m working on pastel and Enchi now

saleengrinch · October 28, 2020, 10:07pm

This is freaking awesome lol. I just wonder how many genes you can stack in before it’s not effective. Also can you use it to tell of lesser and butter are the same thing lol.

chesterhf · October 28, 2020, 10:09pm

That’s what I’m wondering too, because at least in my eyes, some morphs seem almost indistinguishable. Also since I am using all pictures from morph market, some have really weird lighting that make them look brighter than they are, which skews the results a bit.

I have just been grouping lesser and butter as one, but maybe I should separate them?

saleengrinch · October 28, 2020, 10:30pm

I wouldn’t separate them. Same gene imo

chesterhf · October 28, 2020, 10:38pm

Update: I added Enchi and pastel and it’s still currently running at 95% accuracy. I’m actually pretty impressed with how well it’s differentiating some of these

However I’m pretty sure I’m going to be hitting a wall with computing power before too long, so if anyone with a pretty solid gaming setup or a super computer wants to take over, I’ll export the project to you

teddydalton · October 28, 2020, 11:26pm

That’s really cool. It will be interesting to see how it would cope with the subtle morphs like yellow-belly, redstripe or blackhead. Have you noticed a difference in accuracy based on the age of a snake? E.g. how some breeders will wait until a hatchling has had a few sheds and gained some size before definitively making an id.

saleengrinch · October 28, 2020, 11:30pm

I don’t think it would have a hard time with yellowbelly. It’s actually a fairly easy gene to identify. I think the hardest part would be between morphs that look similar. And multiple gene animals.

chesterhf · October 29, 2020, 12:29am

Yellowbelly has definitely been one of the genes I’m scared to take on, but I’ll give it a tray next.

Surprisingly, I have not. I was worried this would be an issue as most hatchlings seem to fade somewhat as they age, however I trained it using both hatchlings and adults for each morph, so it seems to work pretty consistently across ages.

Currently it’s running at 95% accuracy guessing normal, pastel, butter/lesser, leopard, mojave, clown, enchi and black pastel, which is pretty good.

My main concern is that it treats each label as an either/or and doesn’t allow one image to have two labels. For example I have it trained to identify leopard, and to identify pastel, however if I upload a pastel leopard, it is going to have to choose between labels and will either classify it as one or the other. I would have to train it specifically to identify pastel leopards as their own label. Which makes it significantly less helpful, so I have to find a way around this issue.

However, I was able to input a picture of an albino black pastel and even though I don’t have it trained to detect albino, it did classify the picture as black pastel, which gives me hope.

Given how much space and power this thing is taking, I’m going to talk to some more knowledgable people about this tomorrow and how to possibly upload it to a server and then also make it accessible to others to play with. Because currently I’m in way deeper than my skill level

chesterhf · October 29, 2020, 12:46am

It wouldn’t let me upload a video but here’s a link to it in action using a picture I took of one of my snakes

erie-herps · October 29, 2020, 2:03am

You’d have to teach the program how the genes relate and work and manipulate together, here are my 3 suggestions:

A: Teach the program how they relate and how every single gene works and reacts with every single other gene and you end up with millions of possibilities which would probably be best to have breeders upload pictures of every single hatchling with all of the visible genetics to a website and a program would read that and automatically edit the master program adding on more and more genetics and complexity to the program.

B: Have the program look at certain key areas on the snake, example(not realistic), mojave causes a certain head pattern in the heterozygous form that is visible with any combination, the program would know that and automatically look at the snake and divide it into sections. Head, spine, sides, spine right behind the head. From there it would have a database of different possibilities like the one you have now does except with different areas of the body and only looks at those areas, except it looks at all of the areas separately at once to combine the findings into supers, standards, and hets.

C. You manually enter in pictures with the full name, including all visual hets, this would take hours and hours just for a few names let alone thousands which is why if you went this route you’d need to have multiple people working on the project.

These are my ideas, you could mix them or try some different ones and see which one works best.