Login with your Social Account

The world's first voice AI crime

The world’s first voice AI crime

We might have witnessed the first crime powered by artificial intelligence in the world where synthetic audio was used for imitating the voice of a chief executive to trick his subordinate in transferring an excess of 240,000 US Dollars into a secret account.

The insurer of the company, Euler Hermes did not name the company which was involved. On a fateful day, the company’s managing director was called and a voice resembling his superiors instructed him to wire the amount to an account-based in Hungary. The money was supposed to cut on the fines of late payment and the financial details of the transaction were sent over email while the managing director was on call. Euler Hermes said that the software was able to mimic the voice along with the exact German accent, tonality, and punctuation. 

The thieves tried for a second attempt to transfer funds in this manner but this time the managing director suspected the intentions and directly called his boss. While he was on the phone with his real boss, the artificial voice demanded to speak to him. 

Deepfakes have been growing in complexity in the last few years. It cannot be detected easily by the online platforms and companies have struggled to handle it. Its constant evolution has made it clear that simple detection would not serve any purpose as it has been gaining an audience through monetization and constant generation of viral content. There are apps that can put someone’s face on any actor’s film clips making it a source of entertainment. This kind of technology would have sounded fancy a few years ago but now it can be misused by anyone who has a creative bent of mind channeling in improper ways. There are many positive uses too. It can be used in humanizing the automated call systems and help the mute people to talk again. But if unregulated it can cause fraud and cybercrime on a massive scale. 

Cybersecurity firm Symantec reported that they managed to find a minimum of three instances where the executives’ voice was mimicked to loot cash from the companies. However, it did not comment on the victim companies. 

In this technique, a person’s voice is processed and broken down into its phonetic fundamentals such as syllables or sounds. Then these can be rearranged to form new phrases with a similar tone and speech patterns. 

Researchers are working hard to develop systems that can detect fraud audio clips and combat them. While Google, on the other hand, has also created one of the world’s most persuasive AI services, Duplex service that can call restaurants for booking tables in a simulated, lifelike voice. 

Scientists have to be cautious when these technologies are released along with framing proper systems to fight scams. 

Guided by AI, robotic platform automates molecule manufacture

Guided by AI, robotic platform automates molecule manufacture

Guided by artificial intelligence and powered by a robotic platform, a system developed by MIT researchers moves a step closer to automating the production of small molecules that could be used in medicine, solar energy, and polymer chemistry.

The system, described in the August 8 issue of Science, could free up bench chemists from a variety of routine and time-consuming tasks, and may suggest possibilities for how to make new molecular compounds, according to the study co-leaders Klavs F. Jensen, the Warren K. Lewis Professor of Chemical Engineering, and Timothy F. Jamison, the Robert R. Taylor Professor of Chemistry and associate provost at MIT.

The technology “has the promise to help people cut out all the tedious parts of molecule building,” including looking up potential reaction pathways and building the components of a molecular assembly line each time a new molecule is produced, says Jensen.

“And as a chemist, it may give you inspirations for new reactions that you hadn’t thought about before,” he adds.

Other MIT authors on the Science paper include Connor W. Coley, Dale A. Thomas III, Justin A. M. Lummiss, Jonathan N. Jaworski, Christopher P. Breen, Victor Schultz, Travis Hart, Joshua S. Fishman, Luke Rogers, Hanyu Gao, Robert W. Hicklin, Pieter P. Plehiers, Joshua Byington, John S. Piotti, William H. Green, and A. John Hart.

From inspiration to recipe to finished product

The new system combines three main steps. First, software guided by artificial intelligence suggests a route for synthesizing a molecule, then expert chemists review this route and refine it into a chemical “recipe,” and finally the recipe is sent to a robotic platform that automatically assembles the hardware and performs the reactions that build the molecule.

Coley and his colleagues have been working for more than three years to develop the open-source software suite that suggests and prioritizes possible synthesis routes. At the heart of the software are several neural network models, which the researchers trained on millions of previously published chemical reactions drawn from the Reaxys and U.S. Patent and Trademark Office databases. The software uses these data to identify the reaction transformations and conditions that it believes will be suitable for building a new compound.

“It helps makes high-level decisions about what kinds of intermediates and starting materials to use, and then slightly more detailed analyses about what conditions you might want to use and if those reactions are likely to be successful,” says Coley.

“One of the primary motivations behind the design of the software is that it doesn’t just give you suggestions for molecules we know about or reactions we know about,” he notes. “It can generalize to new molecules that have never been made.”

Chemists then review the suggested synthesis routes produced by the software to build a more complete recipe for the target molecule. The chemists sometimes need to perform lab experiments or tinker with reagent concentrations and reaction temperatures, among other changes.

“They take some of the inspiration from the AI and convert that into an executable recipe file, largely because the chemical literature at present does not have enough information to move directly from inspiration to execution on an automated system,” Jamison says.

The final recipe is then loaded on to a platform where a robotic arm assembles modular reactors, separators, and other processing units into a continuous flow path, connecting pumps and lines that bring in the molecular ingredients.

“You load the recipe — that’s what controls the robotic platform — you load the reagents on, and press go, and that allows you to generate the molecule of interest,” says Thomas. “And then when it’s completed, it flushes the system and you can load the next set of reagents and recipe, and allow it to run.”

Unlike the continuous flow system the researchers presented last year, which had to be manually configured after each synthesis, the new system is entirely configured by the robotic platform.

“This gives us the ability to sequence one molecule after another, as well as generate a library of molecules on the system, autonomously,” says Jensen.

The design for the platform, which is about two cubic meters in size — slightly smaller than a standard chemical fume hood — resembles a telephone switchboard and operator system that moves connections between the modules on the platform.

“The robotic arm is what allowed us to manipulate the fluidic paths, which reduced the number of process modules and fluidic complexity of the system, and by reducing the fluidic complexity we can increase the molecular complexity,” says Thomas. “That allowed us to add additional reaction steps and expand the set of reactions that could be completed on the system within a relatively small footprint.”

Toward full automation

The researchers tested the full system by creating 15 different medicinal small molecules of different synthesis complexity, with processes taking anywhere between two hours for the simplest creations to about 68 hours for manufacturing multiple compounds.

The team synthesized a variety of compounds: aspirin and the antibiotic secnidazole in back-to-back processes; the painkiller lidocaine and the antianxiety drug diazepam in back-to-back processes using a common feedstock of reagents; the blood thinner warfarin and the Parkinson’s disease drug safinamide, to show how the software could design compounds with similar molecular components but differing 3-D structures; and a family of five ACE inhibitor drugs and a family of four nonsteroidal anti-inflammatory drugs.

“I’m particularly proud of the diversity of the chemistry and the kinds of different chemical reactions,” says Jamison, who said the system handled about 30 different reactions compared to about 12 different reactions in the previous continuous flow system.

“We are really trying to close the gap between idea generation from these programs and what it takes to actually run a synthesis,” says Coley. “We hope that next-generation systems will increase further the fraction of time and effort that scientists can focus their efforts on creativity and design.”

Materials provided by Massachusetts Institute of Technology

artificial intelligence human brain

The brain inspires new type of AI algorithms

Machine learning which was developed 70 years ago is based on learning dynamics in the human brain. Deep learning algorithms have been able to generate results equivalent to human specialists in various areas with the help of fast and large-scale processing computers and giant data sets. However, they produce results distinct from the present knowledge of learning in neuroscience.

A team of scientists at Bar-Ilan University in Israel has illustrated a new kind of high-speed artificial intelligence algorithms which are based on the slow brain dynamics exceeding the learning rates attained to date by state-of-the-art learning algorithms using advanced experiments on neuronal cultures and simulations. The paper has been published in The Scientific Reports.

The research lead author, Prof. Ido Kanter, of Bar-Ilan University’s Department of Physics and Gonda (Goldschmied) Multidisciplinary Brain Research Center, said that till now it has been considered that neurobiology and machine learning are separate disciplines that progressed separately and the absence of likely reciprocal influence is puzzling.

He added that the data processing speed of the brain is slower than the first computer invented over 70 years ago because the number of neurons in a brain is less than the number of bits in a usual disc size of modern computers. Prof. Kanter, whose research team includes Herut Uzan, Shira Sardi, Amir Goldental and Roni Vardi also added that learning rates of the brain are very complex and isolated from the principles of learning in artificial intelligence algorithms. Since the biological system has to deal with asynchronous inputs, brain dynamics do not follow a well-defined clock synchronized for the nerve cells.

A key difference between artificial intelligence algorithms and the human brain is the nature of inputs handled. The human brain deals with asynchronous inputs, where the relative position of objects and the temporal ordering in the input are important such as identifying cars, pedestrians, other road signs while driving. On the other hand, AI algorithms deal with synchronous inputs where relative timing is ignored.

Recent studies have found that ultrafast learning rates are unexpectedly identical for small and large networks. So, the disadvantage of the complicated brain’s learning system is indeed an advantage. Another important finding is that learning can occur without learning steps through self-adaptation according to asynchronous inputs. This type of learning-without-learning occurs in the dendrites, several terminals of each neuron, as was recently experimentally observed.

The concept of productive deep learning algorithms based on the very slow brain’s dynamics provides the possibility to execute an advance type of artificial intelligence based on fast computation bridging the gap between neurobiology and artificial intelligence. Researchers conclude that understandings of our brain’s principles have to be at the centre of artificial intelligence once again.

Journal Reference: The Scientific Reports

Automating artificial intelligence for medical decision-making

Automating artificial intelligence for medical decision-making

MIT computer scientists are hoping to accelerate the use of artificial intelligence to improve medical decision-making, by automating a key step that’s usually done by hand — and that’s becoming more laborious as certain datasets grow ever-larger.

The field of predictive analytics holds increasing promise for helping clinicians diagnose and treat patients. Machine-learning models can be trained to find patterns in patient data to aid in sepsis care, design safer chemotherapy regimens, and predict a patient’s risk of having breast cancer or dying in the ICU, to name just a few examples.

Typically, training datasets consist of many sick and healthy subjects, but with relatively little data for each subject. Experts must then find just those aspects — or “features” — in the datasets that will be important for making predictions.

This “feature engineering” can be a laborious and expensive process. But it’s becoming even more challenging with the rise of wearable sensors, because researchers can more easily monitor patients’ biometrics over long periods, tracking sleeping patterns, gait, and voice activity, for example. After only a week’s worth of monitoring, experts could have several billion data samples for each subject.

In a paper being presented at the Machine Learning for Healthcare conference this week, MIT researchers demonstrate a model that automatically learns features predictive of vocal cord disorders. The features come from a dataset of about 100 subjects, each with about a week’s worth of voice-monitoring data and several billion samples — in other words, a small number of subjects and a large amount of data per subject. The dataset contain signals captured from a little accelerometer sensor mounted on subjects’ necks.

In experiments, the model used features automatically extracted from these data to classify, with high accuracy, patients with and without vocal cord nodules. These are lesions that develop in the larynx, often because of patterns of voice misuse such as belting out songs or yelling. Importantly, the model accomplished this task without a large set of hand-labeled data.

“It’s becoming increasing easy to collect long time-series datasets. But you have physicians that need to apply their knowledge to labeling the dataset,” says lead author Jose Javier Gonzalez Ortiz, a PhD student in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). “We want to remove that manual part for the experts and offload all feature engineering to a machine-learning model.”

The model can be adapted to learn patterns of any disease or condition. But the ability to detect the daily voice-usage patterns associated with vocal cord nodules is an important step in developing improved methods to prevent, diagnose, and treat the disorder, the researchers say. That could include designing new ways to identify and alert people to potentially damaging vocal behaviors.

Joining Gonzalez Ortiz on the paper is John Guttag, the Dugald C. Jackson Professor of Computer Science and Electrical Engineering and head of CSAIL’s Data Driven Inference Group; Robert Hillman, Jarrad Van Stan, and Daryush Mehta, all of Massachusetts General Hospital’s Center for Laryngeal Surgery and Voice Rehabilitation; and Marzyeh Ghassemi, an assistant professor of computer science and medicine at the University of Toronto.

Forced feature-learning

For years, the MIT researchers have worked with the Center for Laryngeal Surgery and Voice Rehabilitation to develop and analyze data from a sensor to track subject voice usage during all waking hours. The sensor is an accelerometer with a node that sticks to the neck and is connected to a smartphone. As the person talks, the smartphone gathers data from the displacements in the accelerometer.

In their work, the researchers collected a week’s worth of this data — called “time-series” data — from 104 subjects, half of whom were diagnosed with vocal cord nodules. For each patient, there was also a matching control, meaning a healthy subject of similar age, sex, occupation, and other factors.

Traditionally, experts would need to manually identify features that may be useful for a model to detect various diseases or conditions. That helps prevent a common machine-learning problem in health care: overfitting. That’s when, in training, a model “memorizes” subject data instead of learning just the clinically relevant features. In testing, those models often fail to discern similar patterns in previously unseen subjects.

“Instead of learning features that are clinically significant, a model sees patterns and says, ‘This is Sarah, and I know Sarah is healthy, and this is Peter, who has a vocal cord nodule.’ So, it’s just memorizing patterns of subjects. Then, when it sees data from Andrew, which has a new vocal usage pattern, it can’t figure out if those patterns match a classification,” Gonzalez Ortiz says.

The main challenge, then, was preventing overfitting while automating manual feature engineering. To that end, the researchers forced the model to learn features without subject information. For their task, that meant capturing all moments when subjects speak and the intensity of their voices.

As their model crawls through a subject’s data, it’s programmed to locate voicing segments, which comprise only roughly 10 percent of the data. For each of these voicing windows, the model computes a spectrogram, a visual representation of the spectrum of frequencies varying over time, which is often used for speech processing tasks. The spectrograms are then stored as large matrices of thousands of values.

But those matrices are huge and difficult to process. So, an autoencoder — a neural network optimized to generate efficient data encodings from large amounts of data — first compresses the spectrogram into an encoding of 30 values. It then decompresses that encoding into a separate spectrogram.

Basically, the model must ensure that the decompressed spectrogram closely resembles the original spectrogram input. In doing so, it’s forced to learn the compressed representation of every spectrogram segment input over each subject’s entire time-series data. The compressed representations are the features that help train machine-learning models to make predictions.

Mapping normal and abnormal features

In training, the model learns to map those features to “patients” or “controls.” Patients will have more voicing patterns than will controls. In testing on previously unseen subjects, the model similarly condenses all spectrogram segments into a reduced set of features. Then, it’s majority rules: If the subject has mostly abnormal voicing segments, they’re classified as patients; if they have mostly normal ones, they’re classified as controls.

In experiments, the model performed as accurately as state-of-the-art models that require manual feature engineering. Importantly, the researchers’ model performed accurately in both training and testing, indicating it’s learning clinically relevant patterns from the data, not subject-specific information.

Next, the researchers want to monitor how various treatments — such as surgery and vocal therapy — impact vocal behavior. If patients’ behaviors move form abnormal to normal over time, they’re most likely improving. They also hope to use a similar technique on electrocardiogram data, which is used to track muscular functions of the heart.

Materials provided by Massachusetts Institute of Technology

artificial intelligence

Chinese newspaper uses robots to generate stories and articles

On Thursday, the China Science Daily publicly announced that it has used software to automatically generate news articles and stories about the latest discoveries and developments from the leading science journals across the world.

The robot reporter named “Xiaoke” was developed by the newspaper in collaboration with researchers from Peking University in nearly half a year. According to the creators, Xiaoke has generated more than 200 stories which are based on English abstracts of papers published in journals such as Science, Nature, Cell and the New England Journal of Medicine.

The content before being published, undergoes a review process and a group of scientists and the editor will verify the content or provide supplementary information if needed. Zhang Mingwei, program head and the newspaper’s deputy editor-in-chief said the scientists would transform Xiaoke to a cross-linguistic academic secretary for helping the Chinese researchers overcome the barriers of language and have quick and easy access to the recent advances in the scientific world in English publications. The articles created by Xiaoke are published here.

The lead researcher of Peking University, Wan Xiaojun who is in charge of the system’s design and technology, said that this tool could do a lot more than just translation. Xiaoke is a good a selecting complex words and sentences and can transform the complex and confusing terms into simple readable and easy news reports as the readers of the news paper also include general public apart from professionals.

Science reporting is important to spread information about recent discoveries and popularizing the knowledge to the masses. Manual review by reporters will also help in engaging people. It recently published a Canadian study about the origin and development of cerebellar tumors by using gene sequencing approach that was published in the Nature journal. The robot also provided the original abstract and a link to the paper. Robots will also help to save time in writing as compared to human editors. According to investors, more journals, conferences and patent information will be automated by systems in the coming days.

Recently many Chinese news organizations have used robots to generate news stories regarding sport and financial reports as well as weather forecasts. The China Science daily is administrated by Chinese Academy of Sciences, which is published five days a week with a circulation close to 100,000 across the country.

The stories generated by Xiaoke have been published online at:  http://paper.sciencenet.cn/AInews/

algorithm image deep learning

Researchers develop deep neural network to identify deepfakes

It was normally considered that seeing is believing until we learnt that photo editing tools can be used to alter the images we see. Technology has taken this one notch higher where facial expressions of one person can be mapped onto another in realistic videos known as deepfakes. However, each of these manipulations is not conclusive as all the image and video editing tools leave traces to be identified. 

A group of researchers led by Video Computing Group of Amit Roy Chowdhury at the University of California, Riverside has created a deep neural network architecture which can detect the manipulated images at pixel level with very high accuracy. Amit Roy Chowdhury is a professor of computer science and electrical engineering at Rosemary Bourns College of Engineering. He is also a Bourns Family Faculty Fellow. The study has been published in the IEEE Digital Library

As per artificial intelligence researchers, the deep neural network is a computer system which has been trained to perform specific tasks which, in this case, identify the altered images. The networks are organised in several connected layers. 

Objects which are present in images have boundaries and whenever an object is removed or inserted to an image, its boundary will be different than the boundary which is normally present. People having good Photoshop skills will try their best to make these boundaries look natural. Examining pixel by pixel brings out the differences in the boundaries. As a result, by checking the boundaries, a computer can distinguish between a normal and an altered image. 

Scientists labelled the images which were not manipulated and relevant pixels in the boundaries of the altered images in a large photo dataset. The neural network was fed the information about manipulated and the natural regions of the images. Then it was tested with a training dataset of different images and it could successfully detect the manipulated images most of the times along with the region. It provided the probability of the image being a manipulated one. Scientists are working with still images as of now, but this technique can also be used for deepfake videos. 

Roy Chowdhury pointed out that a video is essentially a collection of still images, so the application for a still image will also be applied to a video. However, the challenge lies to figure out if a frame in a video is altered or not. It is a long way to go before deepfake videos are identified by automated tools.

Roy Chowdhury pointed out that in cybersecurity, the situation is similar to a cat mouse game, with better defence mechanisms the attackers also come up with better alternatives. He pointed out that a combination of a human and automated system is the right mix to perform the tasks. Neural networks can make a list of suspicious images and videos to be reviewed by people. Automation can then help in the amount of data to be sifted through to determine if an image has been altered or not. He said that this might be possible in a few years’ time with the help of the technologies.

Journal Reference: IEEE Digital Library

recreation poker games

AI program defeats professionals in six-player poker game

An AI program created by Carnegie Mellon University researchers in collaboration with Facebook has beaten top professionals in the world’s most popular form of poker, six-player no-limit Texas hold’em poker.

The AI named Pluribus defeated two top players, Darren Elias, who has the most number of World Poker Tour Titles and Chris “Jesus” Ferguson, champion of six World Series of Poker events. Each of them separately played 5000 hands of poker against five Pluribus copies.
Pluribus also played against 13 pros in another experiment, all of whom have won more than a million dollars playing poker. It played against five pros at a time for a total of 10,000 hands and emerged as the winner.

Tuomas Sandholm, Computer Science professor at CMU created Pluribus along with Noam Brown, Computer Science Ph.D. and research scientist at Facebook AI. Sandholm said that Pluribus achieved a supreme level of performance at multi-player poker which is considered a milestone in AI and in game theory that has been open for decades. Milestones in AI in strategic reasoning have been limited to two-party contests until now. However, defeating five other players in a complicated game opens up new possibilities of using AI to solve real-world problems. The paper detailing this achievement has been published in the Science journal. Playing a six-player contest rather than two players involve fundamental changes in how AI develops its strategy. Brown said that some of the strategies used by Pluribus may even change the way professionals approach the game.

Algorithms used by Pluribus used some surprising features in its strategy. For example, most human players avoid “donk betting” – ending a round with a call but starting the other round with a bet. It is generally viewed as a weak move. However, Pluribus used a large number of donk bets in its game.

Michael “Gags” Gagliano who has won almost two million dollars in his career also played against Pluribus. He was fascinated by some of the strategies used by Pluribus and felt that several plays related to bet sizing were not being made by the human players.

Sandholm and Brown had earlier developed Libratus. It defeated four poker players playing a combined 120000 hands in a two-player version of the game. In games involving more than two players such as Poker, using Nash equilibrium(which is generally used by other two-player game AI’s) can result in a defeat. Pluribus first plays against six copies of itself and develops a blueprint strategy. It then looks ahead several moves for adopting further strategies. A newly developed limited-lookahead search algorithm enabled Pluribus to achieve superhuman results. It also tries out unpredictable moves as it is often considered that AI bets only when it has the best possible hand. Pluribus used efficient computation and computed its blueprint strategy in eight days using 12400 core hours compared to 15 million core hours used by Libratus.

fake fake news indoors

Tsunami of fake news generated by AI can ruin the Internet

Several new tools can recreate the face of a human being or the voice of a writer to a very high level of accuracy. However, the one that is most concerning is the adorably-named software GROVER. It has been developed by a team of researchers at Allen Institute for Artificial Intelligence

It is a fake news writing bot which several people have used for composing blogs and also entire subreddits. This is an application of natural language generation that has raised several concerns. While there are positive uses such as translation and summarization, it also allows adversaries for generating neural fake news. This illustrates the problems that news written by AI can pose to humanity. Online fake news is used for advertising gains, influencing mass opinions and also manipulating elections. 

                                     Try Grover Demo here                                            

GROVER can create an article from the headline itself such as “Link found between Autism and Vaccines”. Humans find the generation to be more trustworthy than the ones composed by other human beings.

This could be just the beginning of the chaos. Kristin Tynski from the marketing agency Fractl said that similar tools such as GROVER can create a giant tsunami of computer-generated content in every possible field. GROVER is not perfect but it can surely convince a casual reader who is not paying attention to every word they are reading. 

The current best discriminators can classify fake news from the original, human-composed news with an accuracy of 73%, assuming that they have access to a moderate set of training data. Surprisingly, the best defense against GROVER is GROVER itself with an accuracy of 92%. This presents a very exciting opportunity against neural fake news. The very best models for generating neural fake news are also the most appropriate for detecting them.

AI developers from Google and other companies have a huge task as the spam generated by AI will be flooding the internet and advertising agencies will be looking forward to generating the maximum possible revenue out of it. Developing robust verification methods against generators such as GROVER is a very important research field. Tynski said in a statement that since AI systems enable content creation at a giant scale and pace, which both humans and search engines have difficulty in distinguishing from the original content, this is a very important topic for discussion which we are not having currently. 

Robot uses machine learning to harvest lettuce

The ‘Vegebot’, developed by a team at the University of Cambridge, was initially trained to recognize and harvest iceberg lettuce in a lab setting. It has now been successfully tested in a variety of field conditions in cooperation with G’s Growers, local fruit and vegetable co-operative.

For a human, the entire process takes a couple of seconds, but it’s a really challenging problem for a robot

–Josie Hughes

Although the prototype is nowhere near as fast or efficient as a human worker, it demonstrates how the use of robotics in agriculture might be expanded, even for crops like iceberg lettuce which are particularly challenging to harvest mechanically. The results are published in The Journal of Field Robotics.

Crops such as potatoes and wheat have been harvested mechanically at scale for decades, but many other crops have to date resisted automation. Iceberg lettuce is one such crop. Although it is the most common type of lettuce grown in the UK, iceberg is easily damaged and grows relatively flat to the ground, presenting a challenge for robotic harvesters.

“Every field is different, every lettuce is different,” said co-author Simon Birrell from Cambridge’s Department of Engineering. “But if we can make a robotic harvester work with iceberg lettuce, we could also make it work with many other crops.”

“At the moment, harvesting is the only part of the lettuce life cycle that is done manually, and it’s very physically demanding,” said co-author Julia Cai, who worked on the computer vision components of the Vegebot while she was an undergraduate student in the lab of Dr. Fumiya Iida.

The Vegebot first identifies the ‘target’ crop within its field of vision, then determines whether a particular lettuce is healthy and ready to be harvested, and finally cuts the lettuce from the rest of the plant without crushing it so that it is ‘supermarket ready’. “For a human, the entire process takes a couple of seconds, but it’s a really challenging problem for a robot,” said co-author Josie Hughes.

The Vegebot has two main components: a computer vision system and a cutting system. The overhead camera on the Vegebot takes an image of the lettuce field and first identifies all the lettuces in the image, and then for each lettuce, classifies whether it should be harvested or not. A lettuce might be rejected because it’s not yet mature, or it might have a disease that could spread to other lettuces in the harvest.

The researchers developed and trained a machine learning algorithm on example images of lettuces. Once the Vegebot could recognise healthy lettuces in the lab, it was then trained in the field, in a variety of weather conditions, on thousands of real lettuces.

A second camera on the Vegebot is positioned near the cutting blade and helps ensure a smooth cut. The researchers were also able to adjust the pressure in the robot’s gripping arm so that it held the lettuce firmly enough not to drop it, but not so firm as to crush it. The force of the grip can be adjusted for other crops.

“We wanted to develop approaches that weren’t necessarily specific to iceberg lettuce so that they can be used for other types of above-ground crops,” said Iida, who leads the team behind the research.

In future, robotic harvesters could help address problems with labour shortages in agriculture, and could also help reduce food waste. At the moment, each field is typically harvested once, and any unripe vegetables or fruits are discarded. However, a robotic harvester could be trained to pick only ripe vegetables, and since it could harvest around the clock, it could perform multiple passes on the same field, returning at a later date to harvest the vegetables that were unripe during previous passes.

“We’re also collecting lots of data about lettuce, which could be used to improve efficiency, such as which fields have the highest yields,” said Hughes. “We’ve still got to speed our Vegebot up to the point where it could compete with a human, but we think robots have lots of potential in agri-tech.”

Iida’s group at Cambridge is also part of the world’s first Centre for Doctoral Training (CDT) in agri-food robotics. In collaboration with researchers at the University of Lincoln and the University of East Anglia, the Cambridge researchers will train the next generation of specialists in robotics and autonomous systems for application in the agri-tech sector. The Engineering and Physical Sciences Research Council (EPSRC) has awarded £6.6m for the new CDT, which will support at least 50 PhD students.

Reference:
Simon Birrell et al. ‘A Field-Tested Robotic Harvesting System for Iceberg Lettuce.’ Journal of Field Robotics (2019). DOI: 10.1002/rob.21888

Materials provided by the University of Cambridge

New AI programming language goes beyond deep learning

New AI programming language goes beyond deep learning

A team of MIT researchers is making it easier for novices to get their feet wet with artificial intelligence, while also helping experts advance the field.

In a paper presented at the Programming Language Design and Implementation conference this week, the researchers describe a novel probabilistic-programming system named “Gen.” Users write models and algorithms from multiple fields where AI techniques are applied — such as computer vision, robotics, and statistics — without having to deal with equations or manually write high-performance code. Gen also lets expert researchers write sophisticated models and inference algorithms — used for prediction tasks — that were previously infeasible.

In their paper, for instance, the researchers demonstrate that a short Gen program can infer 3-D body poses, a difficult computer-vision inference task that has applications in autonomous systems, human-machine interactions, and augmented reality. Behind the scenes, this program includes components that perform graphics rendering, deep-learning, and types of probability simulations. The combination of these diverse techniques leads to better accuracy and speed on this task than earlier systems developed by some of the researchers.

Due to its simplicity — and, in some use cases, automation — the researchers say Gen can be used easily by anyone, from novices to experts. “One motivation of this work is to make automated AI more accessible to people with less expertise in computer science or math,” says first author Marco Cusumano-Towner, a Ph.D student in the Department of Electrical Engineering and Computer Science. “We also want to increase productivity, which means making it easier for experts to rapidly iterate and prototype their AI systems.”

The researchers also demonstrated Gen’s ability to simplify data analytics by using another Gen program that automatically generates sophisticated statistical models typically used by experts to analyze, interpret, and predict underlying patterns in data. That builds on the researchers’ previous work that let users write a few lines of code to uncover insights into financial trends, air travel, voting patterns, and the spread of disease, among other trends. This is different from earlier systems, which required a lot of hand coding for accurate predictions.

“Gen is the first system that’s flexible, automated, and efficient enough to cover those very different types of examples in computer vision and data science and give state of-the-art performance,” says Vikash K. Mansinghka ’05, MEng ’09, PhD ’09, a researcher in the Department of Brain and Cognitive Sciences who runs the Probabilistic Computing Project.

Joining Cusumano-Towner and Mansinghka on the paper are Feras Saad and Alexander K. Lew, both CSAIL graduate students and members of the Probabilistic Computing Project.

Best of all worlds

In 2015, Google released TensorFlow, an open-source library of application programming interfaces (APIs) that helps beginners and experts automatically generate machine-learning systems without doing much math. Now widely used, the platform is helping democratize some aspects of AI. But, although it’s automated and efficient, it’s narrowly focused on deep-learning models which are both costly and limited compared to the broader promise of AI in general.

But there are plenty of other AI techniques available today, such as statistical and probabilistic models, and simulation engines. Some other probabilistic programming systems are flexible enough to cover several kinds of AI techniques, but they run inefficiently.

The researchers sought to combine the best of all worlds — automation, flexibility, and speed — into one. “If we do that, maybe we can help democratize this much broader collection of modeling and inference algorithms, like TensorFlow did for deep learning,” Mansinghka says.

In probabilistic AI, inference algorithms perform operations on data and continuously readjust probabilities based on new data to make predictions. Doing so eventually produces a model that describes how to make predictions on new data.

Building off concepts used in their earlier probabilistic-programming system, Church, the researchers incorporate several custom modeling languages into Julia, a general-purpose programming language that was also developed at MIT. Each modeling language is optimized for a different type of AI modeling approach, making it more all-purpose. Gen also provides high-level infrastructure for inference tasks, using diverse approaches such as optimization, variational inference, certain probabilistic methods, and deep learning. On top of that, the researchers added some tweaks to make the implementations run efficiently.

Beyond the lab

External users are already finding ways to leverage Gen for their AI research. For example, Intel is collaborating with MIT to use Gen for 3-D pose estimation from its depth-sense cameras used in robotics and augmented-reality systems. MIT Lincoln Laboratory is also collaborating on applications for Gen in aerial robotics for humanitarian relief and disaster response.

Gen is beginning to be used on ambitious AI projects under the MIT Quest for Intelligence. For example, Gen is central to an MIT-IBM Watson AI Lab project, along with the U.S. Department of Defense’s Defense Advanced Research Projects Agency’s ongoing Machine Common Sense project, which aims to model human common sense at the level of an 18-month-old child. Mansinghka is one of the principal investigators on this project.

“With Gen, for the first time, it is easy for a researcher to integrate a bunch of different AI techniques. It’s going to be interesting to see what people discover is possible now,” Mansinghka says.

Zoubin Ghahramani, chief scientist and vice president of AI at Uber and a professor at Cambridge University, who was not involved in the research, says, “Probabilistic programming is one of most promising areas at the frontier of AI since the advent of deep learning. Gen represents a significant advance in this field and will contribute to scalable and practical implementations of AI systems based on probabilistic reasoning.”

Peter Norvig, director of research at Google, who also was not involved in this research, praised the work as well. “[Gen] allows a problem-solver to use probabilistic programming, and thus have a more principled approach to the problem, but not be limited by the choices made by the designers of the probabilistic programming system,” he says. “General-purpose programming languages … have been successful because they … make the task easier for a programmer, but also make it possible for a programmer to create something brand new to efficiently solve a new problem. Gen does the same for probabilistic programming.”

Gen’s source code is publicly available and is being presented at upcoming open-source developer conferences, including Strange Loop and JuliaCon. The work is supported, in part, by DARPA.

Materials provided by Massachusetts Institute of Technology