Video Generation

Video Generation

Transcription provided by Huntsville AI Transcribe

So, welcome to Huntsville AI. I see one new face.

I think everybody else has been here, so welcome. This is a every other week, get together, talk about stuff. Hello.

Hopefully it’s fun stuff to talk about.

So there’s that. So some updates of what’s going on. The AI Council Task Force was going to put together a intro to AI session over at the Chamber of Commerce on Thursday of next week. That got canceled. It’ll get delayed and they’ll try it some other time. That one, they’re trying to figure out some way to just basically get the overall awareness and knowledge of what AI can do and what you can do with it, you know, mostly focused on the general public. So that should, assuming that gets going, that could be helpful. I believe last I was talking to Carl. I think Rigvid over at the I2C at UAH was also looking at trying to get some just introductory or AI or like business users or you know the kind of your end user community which is not necessarily what we kind of gear towards so you’ve got this is something actually the last AI Hustle Task Force meeting I shared those slides are some actually let me Let me go see if these are here real quick, and we’ll come back to it. Nope, I did not have it up there. Okay. So what I had done was broken down kind of your users into three categories, your first one being just your general users of AI, whether they know it or not, because a lot of people are using things and they have no idea that that’s actually an AI model recommending the next movie for them to watch, you know, things like that. The layer down from that is kind of our, has been our kind of a core of your practitioners or trying to think of the other word that I’ve heard. I can’t remember the name of it. Doug Maddy, I think, was using, it wasn’t practitioner, it was something, technician, I think. That’s a normal one, yeah. Okay.

Prompt engineer.

Well, the person that actually makes tools for other people to use.

So you have people using the stuff, people making the stuff. And then the lower level from that is kind of your research level. And that’s where we just started off, thanks to Josh, this kind of a paper review series that we’re doing once a month to go even deeper. So that’s the plug for next Wednesday.

We’ll be virtual to go through some of the papers behind some of the things you’re going to see tonight.

So that’s always an interesting thing.

But we talked about that with AI Task Force folks. And so there’s a giant set of interest and even people that want to host just kind of user level AI session type stuff. So if you’re interested in doing something, it’s more than I have bandwidth to cover. So I’m not going to say this is enough for what I can do. So there’s people wanting to host sessions to get people to come in and talk to the general masses about AI, what you can do with it, fairly specific things.

Yeah. Is it just AI in general or for like the topic of this meeting, is it going to be based in, are they going to break it down in subtopics for like AI for… video generation yeah i think that’s what they’re trying to that’s what they’re looking for uh somebody to put out um just in my head i can probably come up with 12 to 15 specific ai for seem to be very interesting like specific domains too like if you’re doing manufacturing how to use it specifically that’s you know stuff like that not just like the here’s chat gpt it can tell you a poem you know yeah think about uh how would ai be useful to manage my inventory When I have stuff that has a shelf life, when I have stuff that has longer lead items to order, there’s ways to do that. How should I manage my finances for my business to cash flow? There’s things for that. There’s a lot of tools and things that are out there.

There was actually a question we had from the Chamber of Commerce that’s actually trying to… find ways to help small businesses connect to the kind of AI tools that would help them out. So that’s where I was. I sent over a couple of things that a lot of times the tools they’re already using have capabilities they don’t even know about yet. You know, so if you’re using Office or if you’re using Google Workspace, if you’re using, you know, there’s, you know, and what was funny is after I finished the email, I had Jim and I, yes, make this more professional and it did. And I sent it. So yeah, it worked really well. So if you’re interested in doing stuff like that, there is a pretty good need for it, as well as probably folks that would be happy to host things. I’m still trying to see if maybe we take our off week. Right now we have one off week because we do. you know, a meetup, we do a paper review, then a meetup, then a blank.

So in the blank, maybe that’s something where it’s not necessarily at this level, but maybe some where we can cycle through.

They’re getting a lot of interest because they see what’s going on at Birmingham AI. I don’t know if you’ve seen that on, hop on LinkedIn, go follow Austin at Birmingham AI. They’re doing monthly meetups.

They’re getting a hundred and something folks and whatnot.

And it’s like, well, this is cool. I’m like, yeah, we can do that. I mean, every time we throw pizza in the mix, you know, we have people all over the place that show up.

I don’t know that that’s useful for what we’re trying to do, but it is a thing.

So if you’re, you know, it kind of aligns with our, you know, the mission of trying to make AI available, accessible to anybody that wants to play.

So just.

stuff running through my head based on conversations. We do have a, on the 25th, we’re getting together with the American Planning Association, which I didn’t know was a thing, but they are. Their chapter, Lee, or the person that runs that chapter actually works for the city of Hopsville as a planner. So if you reached out today, can you come talk to us about how AI affects what AI can do for planning? Like, yeah, I don’t know it yet, but we’ll learn it and we’ll go figure some stuff out. There’s a lot. Yeah, you can imagine logistics. You can imagine using image recognition to do some things. trying to figure out how long ago it was, we had, we were trying to connect with the guy that works for Aldot on a contract out of, he’s like way down in Fairhope. They were the ones that had put basically a server in a truck with a camera on the front and figured out a way to detect where guardrails were missing or bad or, you know, just by driving.

So figure city planning kind of stuff for that. At the, actually backing up a section, at the last task force meeting, the mayor was actually talking about some company out of Birmingham that they’re working with to figure out that there’s an automated way, if you’ve got imagery, to find city code violations. Not really that hard. You know, hey, this fence is down.

Hey, there’s a car on blocks. Hey, you know, I mean.

Not hard.

The hard part has been how do you actually get video of all this stuff? And so what they came up with is there’s one specific kind, actually a couple of specific kind of vehicles that drive just about every city street on a weekly basis. They come and pick up your garbage and deliver your mail. So there’s a couple of differences they’re looking at, or maybe they may not deliver to your house like me. So there’s a pretty good interest from the city looking for ideas of ways the city could improve. So if you’ve got product ideas or things like that, especially if you have something not to get paid to develop it, but if you had a product to sell, I think there’s a group, especially if you could show the impact of it. Put it in dollar signs. I think it’s not that hard to get folks to be interested.

Driving lessons. Driving lessons. Yeah. Yeah. But that was fun. But tonight, oh, back to the American planning thing. Right now, I think it’s me and Josh that are going to go. If you’re interested in going and learning what these planning folks do, let me know. I can get you on the list. um it is kind of interesting they put out a i’ve got the invite with the link with the sign up thing and it also wants to know what i want to eat order and all this kind of stuff i’m like oh it’s whatever and i get to the bottom it’s like oh by the way also here’s my venmo to pay for the thing i just ordered yeah but it’s a two-hour session from 10 o’clock to noon, I’m talking for the first hour. And then Dr. Neetha Menon, who was also at the AI Supposed thing, had some really good stuff from the UAH, is talking the second part of that. So that’s fun. So what we’re talking about tonight is video generation. Gen AI for video. This is stuff that I really didn’t know a whole lot about.

We did a session February of last year when Sora dropped. And it was groundbreaking enough. We had to stop what we were doing and go talk about it because that’s all anybody was talking about was, oh, my gosh, did you see what OpenAI just dropped with Sora?

And it was pretty interesting. So as part of that.

I was listening to a podcast. I’m trying to remember. I was on my way home from somewhere. And I’ve got several just AI podcasts that I don’t have time to listen to most of the time. But when I’m on my way home from Mississippi on a six-hour drive, I can fit a lot of those in. That’s basically like a one and a half speed or whatever. So one of them was the person that developed the model called Moki from Jinbo.

It’s the first time I’d heard the phrase prompt adherence, which was kind of interesting. I never really thought about it, that a model, when you tell it something, if it’s a long-running thing, can drift from what you told it. In my head, it’s something my wife taught me, is if you’re talking to a three-year-old, you don’t give them three things to do.

You give them one thing to do. If you give them three things to do, they’ll do the last thing and not the first two. So think of this model about the same. Same way, if you can tell it, do this, this, this, and then figuring out what part of that prop is actually going to stick through the whole thing and what part of that prop might get dropped halfway through, you know, so that may come back with one shoe and a backpack and no socks, you know.

I’m supposed to groom out of that one.

Yeah, right. So what we’re looking at doing, there’s several different modes here.

I really wish Ben was here.

Let me check to make sure he’s not online because I’m going to pick on him a little bit.

Flip over. I’ve got Chris, Jacqueline. Let’s see if I can see.

Yeah, I guess not they are.

Okay, well back to back to where this was because Ben is over here on the right.

This picture is from a remote NeurIPS meeting we did back in, I think, 2019 or 2020 over in Houseville West. Basically, just a picture from the back. I tried to make sure that it’s not showing everybody’s, you know what I mean, trying to be cognizant of anything, except for Ben. I’m going to pick on Ben a little bit. Yeah, the CEO for my company is in this shot somewhere, which is… That’s fine. I’m trying to think of a black leather jacket in the front left. Is that you?

I don’t think it is.

I think that’s a leather jacket though. You’d normally wear a leather jacket. I don’t know. This was a couple of years ago.

So anyway, this is the image that we’re going to use.

There’s several different modes for video generation.

The one you’re probably familiar with is I write a prompt and it makes video of what I tell it in the prompt.

That’s generally text-to-video.

Some of the models, like Moki, that is all it does.

It just does text-to-video.

And similar to early chat GPT, I don’t know how many of you have argued with the model for an hour or so trying to talk it into doing the thing you wanted to do.

Yeah, imagine doing that with video.

No, really, please do it this way. We had this rollout with Gemini at work. And part of the difficulty was training people to ask it questions instead of telling it what to do. Right. Most of the models, and this is kind of a societal question that I brought up a couple of different times. At what point is the way that we talk to models or prop to models going to influence the way that we talk to each other?

Because I talk to ChatGPT way different than I talk to a co-worker. YouTube, how fast they talk. They’re doing it for time compression. People have talked really fast to each other since technology became mainstream, I think. I am very… Talking to a model, I tell it exactly how I want it, when I want it, and I do it enough. I don’t know if anybody else… Anyway, the next time that you go talk to somebody and you realize, oh wait, I’ve got my model thing in.

Let me back that up.

The other direction to be true to, so I’ve had people I’ve worked with to teach how to use AI and they get horrible results. So I’ll look at how they’re communicating with the model and I’ll realize, oh, they’re not good at communicating with people either. It’s about that. But, you know, personally, I’ll throw some of my family members under the bus for it. You know, I’ll ask them what they prompted the model with.

And I realize it’s the same info that they give another person.

I’m like, yeah, there’s no guidance there. You left it completely open-ended. Oh, I could use that so often at work. Yes.

A quick thing for that is ask them, hey, go to subject matter expert and tell them the exact same thing you told the AI and see what answer they give you.

And it gets it across really fast. Because you’re not talking to a person. It actually is their communication. So they can’t. You can’t blame that guy. You can’t blame that guy. I’ve been accused of speaking my own language. Yeah. But I do think it will. It will affect some things at some point. Just I don’t know if there’s anybody.

I’m sure there’s papers.

One thing that’s interesting is like with just generally like with school scores being worse on like reading and writing, there’s been a lot of like struggle for like.

I guess, Gen Alpha, interacting with these models because of poor communication skills.

If you look at bigger studies of Gen Alpha versus Gen Z versus Millennials interacting with these tools, as you go farther down, it ends up being worse because of communication skills that you’re talking about. It also ends up, it can potentially go both ways. If I grew up with these models, like you’re saying, I never learned to communicate with humans super well.

I just think in model speak the whole time.

Right. It’s already a different language that I don’t know.

It does bother me a little bit that you just listed three generations and I’m not one of them.

Thanks for that.

Anyway, yeah.

So what we decided after the Smokey thing, I started looking into it and found some cool stuff. And then Josh was working on some other things. And we were talking about the wand paper. I didn’t even know what wand was or anything. And so we took a look. And dang, yeah, it’s changed a good bit, which AI is apt to do on a monthly basis. I mean, it’s been a whole year.

So what we did, we took this image. Oh, here.

Got it.

Make sure somebody’s not still stuck outside. Yeah, yeah, yeah, yeah. So we talked about mail, right? So apparently my Bank of America statement was delivered to my neighbor. Luckily, we met. So, fun stuff. But we took this and threw in, wow, I wish this was a better, I did a horrible job of formatting this. The prop we’re using is a cozy informal classroom filled with natural light. Around 15 attendees sit on mismatched chairs. Initially, I had told it kind of what we do from a Huntsville AI kind of thing because I wanted it to kind of take this video of an actual meetup and turn. image and turn it into a video of an actual kind of what we do at a meetup. So I prompted this in. I think I was using the one to whatever that thing is. They got a nifty little thing that says, would you like to expand my prompt? In other words, I gave it like a sentence. And it comes up with a full paragraph that’s much more precise about different things. So it was kind of neat.

I didn’t even realize that I could go natural light versus other light. So that in itself was kind of neat. Kind of using one model to help you get stuff to put into another model is not new, but caught me off guard. Some taking notes or holding coffee cups. I didn’t even tell it that. Normally, we have at least one person with a laptop and at least one person with a coffee cup.

I don’t know if I’m a confident speaker or a whiteboard.

Whatever, speaker engages, answering questions. Apparently, I react naturally with hand gestures and facial expressions. We’ll see. Smart, short, thoughtful exchanges.

So, yes, we’ve had that. mood is cured, whatever, with a modern startup-like atmosphere. The first time I did this, I don’t even know if I’ve still got the video generated. It basically, it’s sit on mismatched chairs and it had the phrase like, and beanbags. And the video it gave reminded me of something you would see about like a person starting a cult. It was basically me talking and a bunch of people sitting on beanbags. Just, it was weird. Yeah, anyway, so I took the beanbags out, and now it’s got actual people in chairs, so that’s good. We’re no longer in cult territory.

Now we’re back in just normal Kool-Aid startups. Oh, wait, that’s where that came from.

My bad.

So we started off looking at three versions that are available because this Moki One model.

The Genmo, I guess Genmo is the company or something.

They make that available for free so you can go play with it.

The WAN 2.1 model is also available in a workspace for free.

You can go play with that for free.

So we started off with that. We tried Sora.

Sora actually won’t let you create new accounts right now.

I think I even have an account through our company thing to use it. It just won’t create my username and let me in.

So, yay.

So, that was off. We were wanting to look at VO2, which is the Google version.

I even went through and created a Google whatever the thing is that you get into Vertex AI and stuff.

And then found I had to get on some other list to get access to the model to do stuff. I dropped a hint to my Google co-working friend who didn’t get the hint, but several other people found it pretty funny. So there’s that. So starting off with Moki, and I guess I will click link in a second.

This is just a text to video generation.

It will let you generate a limited number per day. So if you come in tomorrow and generate some more videos, things like that, it won’t let you start from an image. You just give it a prop. It gives you a video. It seemed to be pretty quick at turnaround time. They don’t have a way for you to pay them, which means you are the product, in case you were wondering. they are using you for user feedback to see how well their model is doing. So free testing and we’ll cover some of the things that you’ll see these models do. There’s only one that I think is just has some really really wacky stuff. So let’s go ahead and open yet another tab. So this was the model playground they’ve got.

You can describe your video. I started off playing around. This was when I was trying to do trash pandas with rockets. So fun stuff. The next one, I gave it the props that was in the, that we talked about earlier without the video. Fingers are weird. Facial, whatever. Is it limited resolution or frame rate right now? Or was there any choices with that? There were no choices.

One of the things we’ll talk about later when we get into the coffee part, I got to make sure I’ve got time. There are, when we get into some of the workflow stuff that you can do, you can actually take your output, drop that into an upscaler, drop that into, I mean, there’s a whole, I mean, it gets, Overwhelming.

If you will.

If not, it’s okay. But do you want me to run any of your prompts through some of the other models you didn’t get access and email them to you?

Sure.

Do you want to just email me one of the prompts or type it out if you want?

You’re on the, actually, let me go find the thing.

There we go. So for those that have not been on our Discord channel that we’ve got set up with Tech256, feel free to hop on.

There’s a lot of, I mean, we channel this stuff all the time. There’s the prompt. I tried to, the whole part of the thing I was trying to do later was, if I get back over here, I added a section to the end of the prompt after a while because I had Ben over here sitting down. I was trying to get Ben to raise his hand. So at the end, I said the person on the left in the purple shirt or blue shirt, I think blue shirt, raises his hand. And so I tried that. I got some interesting results, which we’ll hop into.

But for like the Genmo playground, if you’ve got a, let’s see, do you want a prompt idea or do you want to give me a prompt to try?

All the cows on the moon.

Cows on the moon. What did we do there?

With rockets.

It had flowers in it. Yeah. How well is it keeping the characters from one project to another?

Have you found out a way for it to keep the same person?

Like a use case for me would be to have a narration for a commercial.

for a medication and they’ve just got these generic stories storylines of these you know because they can’t really talk about what the medication does but they can show you know a tractor getting stuck in the ruts and having to pull the tractor out of the out of the ditch They were playing… They’ve got these storylines with the same character.

Yeah, the same character. It goes through six scenes. And the same colors.

That’s the other thing I’ve noticed lately.

It’s been interesting because I’ve had trouble with images passing through. It’s getting a little bit better. Because a comic book sequence would be another good test.

Can I create a comic character and then I put them in several scenes and just do pictures? You know, the same kind of thing.

That’s something that I’ve been kind of watching them get better at. So you should look at WAN. They specifically trained on character recognition and image. They actually started with an image model as the base model, and then they turned it into a video model. So it can do image embedding really well. Gotcha.

And is that WAN?

W-A-N 2.1.

2.1.

Image to video.

It’s next on our list.

Yeah, it’s really good.

You’ll see it in just a second. We can just let this thing finish going, but the reason I want to do some of these live is to give you some idea of how long some of this stuff takes to actually go and run and give you the video. One thing I have found on the free tier of everything is that you are using some kind of a shared resource. I have found that apparently people that generate videos work later.

in the evenings. So if I take, you know, 15 minutes on my lunch break, you go do something, it goes pretty quick.

If I log in during the 10 o’clock news and try to get something kicked off last night, I got put into a queue for over an hour before it generated.

Yeah, we’ll see what that comes back as. I mean, that was that, Jimbo.

I’ll close that one out.

So the other one we were looking at is Juan actually also has a free tier.

So Juan is the first one you’ll see that’s got text to video, which is kind of what we just did where you type the prompt and it gives you something.

Or it’s got the image to video as far as what this allows you to do. And this is where you can upload your own image. Like for me, I was uploading and this is going to be a little messy because I’ve been doing this a minute. It should be something called NeurIPS Meetup JPEG file. Okay, so there’s that one. So that uploads and then I can give a prompt of what I want to see.

Let’s see, I’ll start spotting.

Wait, a riot erupts? Go for chaos. Always chaos.

The music starts and they start doing the electric slide. That’ll be next. Lots of motion. That’s got names written all over it.

And waving. All right, let’s see. And Jay, would you move on to share that picture in Discord, too?

Or do you want to keep that off there?

Go to hsv.ai. It’s on our homepage.

Oh.

Gotcha.

So this one, estimated time is eight minutes and change.

So is it like a video generator or more like an animated GIF generator?

It’s video.

It’s video. This one, I think, is 16 frames a second. By default, it generates 81 frames.

I think it’s 16 frames a second.

Some of them can do different frame rates than others.

You can get into some crazy tweakage of parameter like nuts.

If you can actually get down and mess with the embeddings and do stuff, then you can get them to extend out very long and do stuff like that.

Do you want to talk a little? I got… a minute or two about LoRa and some of the things people are training and hooking into this to do some very specific types of… The ones I’ve seen, there’s a specific kind of animate. Yeah, sure. So the nice thing about WAN is it’s really easy to train.

The paper is completely open source.

They say like everything, here’s our architecture, here’s how we trained it, you know, all the information you could want about it.

And they’ve had it split up so that you can… do different parts of the model so like they have uh it’s it’s set up so like the camera motion is its own little segment so you can train like oh here’s a pan left and it’s not gonna mess with like your character stuff it’s not gonna miss the image generation stuff uh and so uh but it’s actually training out uh a full 81 frames like as one thing that it’s generating out And so before, a lot of the times, it would basically generate like an image, an image, an image, an image, an image, and try to stitch them together and do interpolation. It didn’t really understand, you know, why does this image have this and why does this image have this? But they have what they call a temporal causal variational autoencoder. which basically means that it is able to take spatial and time-based information and determine the causality that requires those things to happen.

So that’s why you see it is much more consistent with motions and things like that, where stuff still happens.

It’s still very early, but it’s doing something different than it was a year ago with all these things.

That’s kind of what’s really cool. It’s easy to train, and you can do it on 4090s, stuff like that. I think the smallest… that you can run this with is like 8.5 gigabytes of VRAM which is not a big card.

Yes, some of the things that I used to see like objects going through other objects or you have a light post or something and a car passes and after the car leaves it’s gone because it missed the whole part that was there before because if you’re doing image to image after the car covers up the light post there’s no memory of that.

in the next image. So it’s, anyway, there’s a lot of that stuff.

This is still running.

We’ll let that roll. Again, they’re free. I wonder about, I mean, in cases where there are those edge cases where something goes horribly wrong.

Yes.

Someone’s hand goes through a wall and that wasn’t a quantum miracle. Yes.

Could you pair up, Sam, a segment in anything to be able to highlight, okay, I want to look at this particular part of the scene and would then say so much of a pattern in their area, make an attitude.

Absolutely.

It’s actually called Vase.

It’s the system that they’ve, they actually just released that today in open source staff. And they basically paired like a whole bunch of different extra tools, like segmenting anything, definitely that. But other things that are doing like checking, they’ll put it with like Quinn, like, is there something in here that’s wrong?

And they’re training it on that sort of adversarial relationship so that you can, it’s going to do bad generation. It’s going to sample the wrong parts of the thing. But then you can then say, okay, maybe make that guy’s head not disappear and have that conversation with it. And it actually does it. That’s useful.

They’re training it to do that for sure.

Right now, you absolutely have to check the work it does. We’re going to show some stuff in a minute that is some wackiness. So again, process takes longer on the, so far. Let me see if it’s actually completed.

Yeah, so here is, here’s your trash pandas, cows on the moon, rockets and flowers. So, Fun, you know. It is.

Mogies have been special.

Yeah, it is. So we’ll let the wine one keep running for, it’s got a little bit further. So again, for the 2.1, the free one, there’s not a paid thing I could find on that particular site.

There are ways, so to get credits, you can either publish your stuff. on their site if you have something neat which is great for them free publicity of cool stuff and it also gives them some intrinsic kind of clues if you’re generating a bunch of stuff and never downloading it it’s probably not good you know so i mean there there’s some things they can take out even if you don’t rate the things moving into the paid stuff runway ml um and Sorry for the duplication on the picture thing. Yeah. This is the first one I came across that has a paid option.

And all of these operate on credits. I feel like I’m back at the county fair and nobody takes money. They all want tokens or tickets or something. I have no idea what stuff costs anymore because I’m using not cash. Similar. They all want credits. And a credit means something different. for every different platform you try to run this on so 25 credits at this one does not mean 25 credits at that one and whatnot so um so 25 credits per five seconds of video um i can’t anybody want to do 625 divided by 25 out of your head 25 25 okay so 25 so 15 a month you can make 25 videos small short seconds. Is there any resolution? Let’s look.

So I did a couple on here as well. I think Runway at a certain tier they have like a relax mode or something like that but you can do unlimited video so it’s like kind of like mid-journey. But it’s not the lower tier. The lower tier, it’s like you get the credits and that’s it. Oh, yeah. On theirs, when you get into Unlimited, it’s $95.

So if this is your job, it may be worth it.

I don’t know how many videos you have to make. It’s pretty good. I take credits. I’m not spending $95. I know. Sorry, wait.

That one’s that one.

Here we go.

It’s too good that I need it, but it gets the message a little better.

Yeah so this one has two different options. Mostly are you going vertical or horizontal. See if it’ll actually show the sessions I did already. That’s what I was kind of hoping for. The other thing with this one actually like Josh was saying earlier some of these have like a little bit of a separate model that lets you do things like camera control. So if you’ve got, you can say I’ve got a speaker standing in front and you can say the camera pans to the left while centering on the whiteboard or something like that. You can actually do some things like that. It’s pretty interesting.

Actually, let me close this window, see if it’ll actually let me get back to the sessions I had. Oh, here’s the, but back to one. This was what it generated.

So apparently, yeah, there’s a person. I do not see a riot. I don’t see a riot. Wait, is this the one we did the riot? Looks like it.

Yeah.

No, wait, this is actually… Oh, no, that was the one I did before.

Okay.

So yeah, there’s definitely not a riot. There is hand-waving. So that was kind of boring. uh close that out go back over eight minutes for that yeah um i go over the runway i have to talk a lot faster welcome back remember who i am um recent generations something similar um you can kind of tell this one is more geared towards uh This one does seem to have more of a handheld camera motion type thing just by default. I didn’t tell it to do anything like that. One of the ways I was looking at this was actually a ten second one that I generated which took twice as many credits as the five second one.

Go figure.

That’s all I like the depth perception.

Yeah. It’s pretty interesting that he gets the reflection on the left side.

Yeah.

I mean, this one was pretty interesting. The other thing from a lot of times when I’m posting things about sessions, I’ll take the picture or whatever, and any time I post something on LinkedIn with a picture, it gets more interaction than if I didn’t use a picture. It is super easy now to take a picture, turn it into a five-second video clip.

It’s probably going to get even more until everybody else does it, and then whatever. Are there any of these where you could upload a model like a GTLM4 model or something?

Like a 3D mesh?

Like a blender mesh? I think that there are some people who are doing stuff like that but nothing that’s like productized right now. I think Microsoft has done a lot of stuff and I think Alibaba too. As fast as AI has been going, something like this is probably going to pop up on Bing Image Generation or something like that in about a month. Sure. There’s that one.

I do want to keep a hot blog because I do have one of these things running local. Not local, local, but Cling AI. Again, more credits, different prices, everything here ends in .99.

So, yay.

It’s a deal.

Yeah, it’s a deal. It’s a steal. This one might be the one that had the funky, do I have, oh I have this up already. Let me go over here.

Yeah this is probably it. The person on the right in the blue shirt raised their hand.

I don’t know if this is the one that you would definitely want to check your work before submitting it anywhere. That one must be the other.

Let’s check this one.

Still no. I get a little bit of something that looks like a hand raise, maybe, but not really.

So we’ll close those out and go back. Maybe it’s this one, which is over here.

So this one, again, everything into nine, and they have credits as well.

Yay credits.

This is, yeah, this has, this maybe has to be it.

So, yeah. There’s a hand.

There’s a hand.

Couple of interesting things.

A hand may look more like a salute than a hand raise, depending on what you’re looking at. Notice when the guy raises his hand, the guy next to him to his left, watch his head. It just goes away. It disappears. Well, it’s like it disappears for maybe a frame or two, but it recovers.

Maybe.

I don’t know.

It looks like it moves it over to his shoulder. Yeah, I’m just super impressed.

And then the way we’ve got three-dimensional, you know, people are moving is amazing.

how we’ve got that.

You actually have set up shots to do things like this.

You know how many people it takes and what camera and all that kind of stuff. It’s training the camera as a part of the model where it has its own place in there. Another interesting artifact here from our buddy Todd who was in this shot in the white shirt and apparently his beard turns into a hand with fingers doing something weird. I don’t know. You don’t notice it the first time you see it. So if this was all just something scrolling through. Right around here. And like you said, maybe… I don’t know what happened to your face, man. I’m sorry. My apologies. Maybe one of these tools gets your concept, and then the next tool gets your high-resolution upscale and what you’re talking about.

Yeah. So, you know, and I think that would be a great use case.

So I get my ideas here, and then I put it into something that’s really going to give me at least 1080.

basically bridge the gap in the frames.

Yeah. And really spend, you know, spend some time.

They trained a smaller model to do that.

That’s awesome. And you basically do like a depth map.

You can do like different sort of preprocessors and change the retexture and all that sort of stuff. That’s cool. Very cool. That’s something to think about, you know, tool chains or stuff like this.

You are queuing me up.

For 12 minutes. Product thoughts.

Each one of these has like a slightly different niche. You got some that are higher res, some that actually lets you do much better with camera motion, things like that. I don’t even know how to calculate the difference between their credits and their prices and their stuff because none of it matches up. So if it works for you, great.

If it doesn’t, don’t use it. So one of the things I’ve seen folks doing a decent amount is more like what you were doing.

I’ve got similar conversation. This was last year with Ben, who does a lot of Roblox games and things like that. It’s the point where I think that’s his job now. He did a lot of that stuff. Really bright guy. Used to come to co-work, not a lot. Super bright guy. A lot of what he was doing was using either Gen AI video or the text-to-speech stuff to just kind of walk through a plot or get an idea of something before he goes and hires actual actors and people to do the thing. That way he’s got it in his head and he knows how the dialogue works. He knows what the shot looks like and all that kind of stuff before dropping a boatload of money to go actually make it. Super good for that. Let me jump real quick over and see if… Yes, I put six in there for the different… Oh, my gosh.

Okay.

That’ll be quick.

Sor, just a text prompt. Sor is funky.

Yeah, I don’t know. But faster than what I just had claimed. Just a text prompt. So do you like have stuff for all of these?

So I have access to Kling through Hedra, which I got for their character three.

That’ll be coming up later.

I have access to Hicksfield. That’s one of the newest ones I’ve found, which is very impressive.

Oh, that’s much better.

Okay.

This is me running out of a room somewhere.

Look at me go. I come in from the right. I might be dancing. I’m not quite sure. Really cool thing with the Sora adding the image prompt. Watch what happens.

I don’t know if this is the one with the mirror or not.

Which one?

Yeah, try that one. This one? Okay, so look at the mirror reflection. It tries to get the reflection. I thought I noticed that earlier. I didn’t even realize there was a mirror over here. Well, it was the TV reflecting off. Oh, yeah. I didn’t notice. I thought I noticed that with the other samples where it was. It’s on the back of somebody’s head. This one might have audio. You have to feed character three with audio.

Okay.

Oh, this is interesting.

Well, my problem, I can’t get on my speaker.

That’s a comedy.

We will feedback like crazy. Will these stay here? We’ll look at them later. Will these stay on here?

We’ll look at it later.

They’ll be on Discord. It’s on Discord? Yeah. I don’t know if you’re on Discord.

I’ll get with you and get on it.

Okay. Yeah.

It’s easy. I can do it. I learned how to have like a video call two weeks ago.

That’s the stuff I should know.

We’ll say the Higgs field gives you enough free credits to make like two videos so anybody can check it out for free. Oh wow, okay.

Pretty cool.

So back to, I got a little bit.

So we started off with that. I started off on Runpot, which is a place where you can create your own VMs and spin up a virtual machine with a graphics card and all kinds of different levels of a graphics card. I actually spun up a virtual machine last night with an H100. I didn’t even know I could do that.

It did cost me $3 an hour. What?

It’s an H100.

It’s an H100 with 96 gig of RAM?

I mean, it was a… Something that we never could have done a year ago or a little more than that.

These were just basically my running notes as I fought with this stuff to try to get it going. Let’s see the main notes. So when you, if you go to run pod, you can go create an account pretty easy.

You can spin up a, let’s say I want to deploy a new thing.

This isn’t necessarily what we’re.

talking about, but like a 4090, which is what I’ve been using mostly, create a 4090.

You can pick all kinds of different templates for different versions of PyTorch or different versions of, I think, TensorFlow, or the one we’re using is going to be called CompPUI, which I didn’t know was a thing until a couple days ago.

So, yeah.

There are templates for just about any kind of AI workflow that you are normally going to want to do.

So super simple to create.

There’s also things to watch out for is your container volume space, your container space and your volume space because if you’re going to be downloading models that are 40 plus gig and you by default have a volume space that’s 20 gig. you’re basically going to waste about 15 minutes of your life and then have to go do it again. The note that I captured is because you can actually, whoop, jumped too far. You can actually up the space. Basically, you can edit the pod and crank up the space and it will shut down, crank up the space and start up again. The thing to remember, anything that got downloaded halfway is still only halfway. but it’s hard to find which ones are halfway, which ones are not. Just so there’s a, that’s the thing.

I lost an hour or so fighting with that.

I spent a long time trying to get the base model from Juan 2.1 or whatever this thing is. They, it’s on Hugging Face. They give you a description of how to clone it, what requirements are, how to install things, how to download model scope if you want, a Python single generator, or further down, basically running it with Gradio. And what I learned, let’s see, I had a lot of problems getting it to use the GPU. That was wound up being… difference between the CUDA version on the thing and the CUDA version that was in my template and things didn’t like each other. The other thing somewhere in here I did mention that the mostly what I did with WAN 2.1 out of the box was run out of memory on just about everything I tried it on. By default the model is 14 billion parameters and it’s 32 bits per weight and it takes a heck of a Yeah. So there’s that.

Then I learned that most people don’t actually do that.

Most people use the, there’s a floating point 16 version of it, at which point you’re at 14 billion times 16 bits per wave.

So I need 28 gig of RAM, not 60 or whatever it wound up being.

So that gets small enough.

You can run this stuff on like a 4090.

So I started playing around with that.

Same thing with, I wound up finding the same thing with Mokey 1.

So real quick, we will jump into, oh also in case you run across it, there is a WAN 2 GP.

uh that is basically a quantized version of a separate thing not related to comfy or anything like that and i start i was about five minutes into it and i checked the license and saw that he had taken the license from the free thing and then turned it into something different which is why i’m like no thank you uh bye oh so i mean it just basically said not he took an apache license and turned it into a not commercial for some reason oh i don’t know oh so there’s that If you start up a run pod and just pick this comfy UI thing, there’s some instructions on how to do it, but comfy UI, I’m not sure if I’ve, let me see if this actually has everything I need. Maybe it doesn’t. I may be out of time anyways.

We may have to pick this.

You may have to trust me for a second. So what Comfy UI does is it provides, and this is what a lot of the people that are doing diffusion models for imagery, for video, and stuff like that, and they’re doing a lot of this. This is what a lot of them use as far as I can tell. Generally, the documentation is fairly good on how to get it working and all of that.

My problem right now is that I’m too cheap to just leave this thing running overnight and whatnot, so I shut it down. I tried to get it spun up before we got into this session, but I didn’t have enough time. What this is, is actually, if you see some of this in here, here’s actually the text prompt. That is the problem.

I didn’t make that up, by the way.

You can see for yourself.

Cute animator. Let’s actually see what my… I can’t leave it working by itself.

Yeah, you can’t leave it alone. Facebook page. Oh, this is… I do video these, so this is going to be fun. I’m looking for a U-Net loader.

It’s right above.

Yeah, yeah, this is a very, yeah, the fusion model right there. Okay, so. So it came with that text. It did. Sure. So I may not have a, actually.

It doesn’t have anything now. If you’re doing the fresh copy image. That’s kind of how you link things together in Unreal Engine.

That’s what I was about to say.

Yeah, it’s a node-based engine. The nice thing is, like, so you’ve got all the papers, and you look at them, and it has all these blocks and sort of stuff going here.

And this basically can let you split it up. So it’s like, okay, I want to change this to half. I want to change this to 75 and see what this does and what this does. It might be that, you know, your use case, you can optimize that really in there and do weird stuff with the app.

Okay, there’s the example image they gave us.

Let’s see if it’s actually… Yeah, yeah.

Does anyone here use Unreal Engine?

It’s… It’s right.

Let me go change. Creepy method.

I’m using it without a mouse, which is hard. So right now it’s loading stuff in the clip. Actually, I don’t know if it tells me what it’s doing over here. Yes, it’s off and working. This particular one, it only shows up up here that 62% image to video.

Eventually, it will hop over.

Where is it?

The sampler?

I think it’s the sampler since I’ve got the preview on.

I think that’s where it shows up.

I remember how to move this out of the way so when it gets there.

So this is the thing a lot of people are using to take the prop to the image and then drive it to the encoder and then do the, you know, the text prop to the encoder and then to the model. then the sampler and the sampler out to the image.

That’s where you get images and that goes out to this other block to actually in this case get a web, whatever this thing is.

Same item, this actually decodes back into here.

So some of the things I’ve seen with some fairly interesting workflows are the kind where you make one video and then you take the last image of the last video and you use that as the starting point of your next five second shot.

And you can do some interesting, I mean, like, seriously, take your storyboard, drop it into this kind of thing, and frame to frame to frame to frame. Beautiful.

And you can integrate in, like, LLMs, too.

So you can take, like, one text prop and then have, okay, take this text prop and generate, help me, five props for a storyboard of this.

During the images of those, use those as keyframes. So you can kind of start doing that sort of stuff.

That’s a hot mess, but you’re doing something crazy. You’ll actually see the diffusion model start to pick up on this. This is a 512 by 512 piece in the middle of this whole kind of a workflow thing.

So we started off with this really funky image here and what the prop was was a cute anime girl with Big ears, I guess. I don’t know what that word is. And a fluffy tail wearing a maid outfit turning around. So… Sure you don’t. Yeah, this is definitely… So this should be entertaining if nothing else. And it’s also why you don’t do this live in a video that’s also recorded. Like we do.

So it’s been somewhat of a learning experience, not only learning some of the models and how they work, but also this new tool about this is the normal workflow for a lot of us.

If this actually works, which is 70-something percent or whatnot.

Oh, the other fun thing with RunPod, it actually gives you some clue what may be done.

If you go back to my pods, the one that’s running, it should tell me that it is pretty much loaded my GPU up and it’s using nearly all of the memory on it.

So anyway, that’s the video it made.

Yay. MS Pink, coming at you. Pretty much. But the other thing is I don’t know how long that took to generate.

I mean, we’re doing that live. Oh, no, you’re running pretty much cold.

Yeah.

Usually they take about five to eight minutes, something like that, if you’re really optimized.

Right.

And I don’t have any of that in here.

And I’m doing this at less than a dollar an hour. So this is, what, 67 cents an hour? I’m not sure.

Does it show?

So if you go to – I actually sent you a message.

Oh, 69 cents per hour.

Yeah.

I sent you a message in Discord, a private message over there.

Oh, thank you, Dr. Mike. So this is one that somebody gave me. He gave me an image that he was working with earlier today. If you play that video here. This one? Yeah. And it’s talking about like he stops to look and make a plan and then he accelerates with the dust behind him. Oh, nice. And so it can generate that in about eight minutes.

And I just do an LLM prop, the storyboard out, and it says beat one, beat two, beat three.

And I just feed the beats. I’m not even doing like additional stuff. I’m just saying that those are the beats that are happening inside of it. And it knows how to space them temporally if you have the right sort of configuration on it.

It’s like an image out of Halo, but it’s not Halo.

Yeah. I mean, they’re doing complex physics stuff. I mean, that dust going behind it was right, essentially. Well, the mirror image. I mean, I didn’t even notice that until somebody mentioned it.

The guy walking in front of me, I’m like, wait, what? So it’s pretty interesting.

Yeah, I closed that before. So let’s see.

Let me close that.

Yeah, I don’t want to save it. Okay, that’s what we’ve got. Let me check and see.

So we’ve got several others that joined us.

So if you’re online, if you’ve got questions or anything, drop them in the chat and I’ll see if I can cover. There’s some issue with mics or something, us being able to hear anybody online. So some of the other interesting things to think about when chat GPT dropped in some of these other Gen AI models, it scared enough people to go make a writer strike and things like that to try to keep their job. You can imagine the same kind of a thing applying to all kinds of.

can make this now for my own business i don’t have to go get an ad company or whatever you know we’re we’re super close to where that’s a thing where there’s a product that’s less than 50 bucks a month to chin whatever you want with your own people with your own logos with your own you know i’m interested to see where that goes Stable diffusion, you know the answer to… Actually, it’s not on the screen. That would be helpful. It’s a question from Lauren about what diffusion models people are using under… So stable diffusion is dead. I think Flux is really the big one right now, and now WAN for video generation. There are other ones that are more niche, but those are kind of your safe. Oh, the other thing I didn’t show. Back on this fun thing. For workflow, I can go to browse template.

I think there’s some Flux stuff in here.

Yeah. Yeah, Flux really is really good for like commercial realism sort of marketing style stuff, I think. It does text well. Yeah, I mean, I just… There’s something kind of handy that I’ve been looking for from work and I can’t find it. You guys remember node network maps, like tracing for Kubernetes, like in the relic interface.

I will show you the little plots and stuff.

This was kind of sort of the same idea as a stack trace, but you could see things moving through that kind of spaghetti diagram for the network.

I can’t for the life of me remember what the name of that model is.

A network node map is as close as I can get.

It’s like an animated node network map.

Not sure.

I feel like there’s lots of things that are there.

I mean, it’s the whole world of DAGs, anything that’s a kind of sort of a DAG builder where you can watch it go through.

Directed acyclic graph.

I mean, I just look for stuff around that.

Like an IP traffic flow thing, kind of showing connections of data from one machine.

You’re seeing stuff move through the network.

I think I know what you’re talking about. Yeah.

And now, of course, there’s probably 800 of them. Different names now with the same thing and searching is hard.

This is probably what that was trying to do. Yeah, that’s the logo, I think. Okay. Comfy.

So, again, upscaling.

Let’s say you made some videos.

There’s other kinds of nodes in Comfy by default. to do that kind of thing.

So if you needed to make a higher res image or something, I haven’t played enough with it to just start willy-nilly dragging stuff in and doing the things. Because the first one I ran into, I don’t, it’s not.

loaded in part of this. There’s another person that made a wrapper around the Lawn 2.1 model and did a whole bunch of stuff, but it exposes nearly every parameter in the model to let you change and configure and with not great defaults.

So the first thing I made was like a brown It’s the map. You’ve got all the stuff exposed. They’ve done all this stuff to make it nice and usable in ChatGPT, but if you start messing with the parameters, it might not be trained for that. You’re going out to the weird spaces.

So if you like the weird spaces, it’s available and it’s not hard to get in the middle of some things where you change the number and all of a sudden nothing works.

And the problem is it’s not like it’s version controlled where I can now go find the number I changed from last time.

I’m sure there are ways to do it.

With that, I guess it’s a good time to plug the paper review that’s dropping next Wednesday where you get to learn more about how all of this stuff works. I can talk a little bit about what we’ll be going over. Yeah. So next week, we’re going to be looking at the one paper.

So a lot of what we’re looking at is variational auto encoders, we’re going to look at kind of how diffusion models work. So we’re going to do the same thing we always do, like quick first half is like, what the heck is all this stuff at a high level, and then kind of what this specific thing is doing. So we’ll look at variational autoencoders, diffusion models, we’ll look at what’s called flow matching, which is really a very important part of all this, which is basically how they make all the diffusion stuff super easy. And then we’ll talk about some of the spatial sort of stuff we talked about tonight. And then the second half will go through and look at like their architecture and really looking at, there’s a lot of questions you probably have, but like, okay, but how does it do that? How’s it doing the camera stuff? How’s it doing the character stuff? Can it do this?

And the wallpaper is awesome. this thing is really great because they go through like literally everything like this is how we trained this model at this resolution for this many steps on this many images and we did it at this learning rate and this you know sort of thing and so it’s really really has dense with that information we’re going to find the most important ones of those and kind of go through all of that but it’s really cool and this diffusion stuff is not just images It’s not just video. This is stuff they’re using like alpha fold. They’re using fold proteins. This is the stuff that they’re using for lots of robotics style tasks as well. It has lots of distribution stuff. So this is not just, you know, toy mid-journey sort of stuff.

There’s lots of things you can do if you can turn any problem into a differential equation and work it backwards.

And that’s what this is doing essentially.

My approach is generally to go to the paper review virtually and then take all the notes, write down all the words I don’t know, and then go back afterward and watch the video again and go find, you know, for me it’s multi-pass, but it’s a lot of information that you get into one block of time.

So it’s pretty good.

I don’t think we got any more questions.

So thanks everybody online and everybody in the room for coming out. It was actually not a freezing or tornadic night. All right, let me end it. Thanks everybody.