
Even casual tech news readers are familiar with generative AI tools like ChatGPT, Stable Diffusion, Midjourney, and DALL-E. Big Tech is vying for the best large language models to be incorporated into every software or web service we use, and a flurry of startups are developing specialized AI tools for a wide range of niche use cases.
A significant number of these instruments can create valuable pictures or text utilizing straightforward prompts that portray what the client needs to find out or the sort of work they’re attempting to accomplish. This makes services like ChatGPT and DALL-E appear to be made of magic when it works. We are reminded of how far we are from AI ever replacing human creativity when it fails to do so. In fact, many of these tools are “trained” on human-authored works and require human oversight to meaningfully improve their output.
However, recent AI research demonstrates that, particularly in the area of image manipulation, progress is still being made at a rapid pace. A gathering of researchers from Google, MIT, the College of Pennsylvania, and the Maximum Planck Foundation for Informatics in Germany have distributed a paper enumerating a trial device that could make picture altering simpler and more open for customary individuals.
By simply clicking and dragging on a specific feature, you can significantly alter the appearance of a person or object to give you an idea of what’s possible with the new tool. Altering someone’s facial expression, altering a fashion model’s attire, or rotating a subject in a photo as if it were a 3D model are all examples of other things you can do. Although the tool is not yet available to the general public as of this writing, the video demonstrations are certainly impressive.
Despite the fact that it resembles Photoshop on steroids, this has garnered sufficient interest to cause the research team’s website to crash. After all, text prompts may appear straightforward in theory, but when you need something very specific or require multiple steps to generate the desired output, they need a lot of tweaking.
The “AI prompt engineer” profession has emerged as a result of this issue. Contingent upon the organization and the particulars of the undertaking being referred to, this sort of occupation can settle up to $335,000 each year, and it doesn’t need a degree.
In contrast, the demo videos’ user interface suggests that the average person will soon be able to perform some of the same tasks as an AI prompt engineer by simply clicking and dragging on the first output of any image generation tool. According to the researchers, DragGAN can “hallucinate” content that has been obscured, deform an object, or alter a landscape.
When using Nvidia’s GeForce RTX 3090 graphics card, the researchers claim that DragGAN can morph the content of an image in a matter of seconds because their implementation does not require the use of multiple neural networks to achieve the desired results. The next step will be to create a model that is comparable for editing 3D models using just points. The paper can be read by those who wish to learn more about DragGAN. In addition, the study will be presented in August at SIGGRAPH.