June 17, 2018Ilya Kuzovkin

Categories:Machine Learning, AI, Computer Science

Notes on ICRA 2018

Here are my notes from my first ever visit to a robotics conference — ICRA 2018. Coming from machine learning background many things were very novel to me, but it was interesting to note the influence machine learning is having on the field. Neural networks were mentioned in every second talk and while there is a general skepticism about DNNs being the ultimate solution, people like the thought of robots learning their behaviors.

Workshops

Machine Learning in Planning and Control of Robot Motion

Robot Planning and Learning under Uncertainty: Data, Models, and Actions

Advocates for simultaneous learning of model and policy. QMDP-Net.

Integrating Algorithmic Planning and Deep Learning for Partially Observable Navigation

As AlphaGo combined planning and learning, the same idea should be applied to robotics to combine learning and planning. Visual navigation under partial observability.

Machine-Learning Challenges in Planning for Hierarchically Organized Systems

Bremen harbor manages 6 million cars per year. Management of the port is highly hierarchical task.

Deep Neural Networks for Swept Volume Prediction Between Configurations

Usual way of motion planning is to check all points along the trajectory. Another way is to compute swept volume and then see if volume intersects with objects. Swept volume is a volume that incorporates all points in space robot occupies during motion. Sweep volume is a better metric for motion planning than C-space euclidean distance. Exact sweep volume is intractable, they learn an approximator (deep net: from start and end configuration to volume; 3 FC layers, 100k samples), works well and is fast. To use for sampling based motion planning.

Visual Robot Task Planning

Next state prediction from a high level representation of the current state and the action agent is about to take. Presented their DNN architecture to do it. PyBullet + UR5 model as environment. Using prediction of future states in MCTS for planning.

Multi-Arm Self-Collision Avoidance: A Sparse Solution for a Big Data Problem

Sample poses in simulation to build 5 million point dataset and train SVM to predict {collision, no collision}. Works in real (with KUKA arms) — if collision-prone situation is near arms try to modify their paths. SVM predicts with 2 ms frequency.

The Right Tool for the Job: About Swiss Army Knives, Hammers, Motion Planning, Control, and Machine Learning by Oliver Brock

Specific tools vs. general approaches: tools should be specific if they can. POMDP and DNN replicate solving problems that are already solved, motivation is the promise of general solution, but… maybe “what works well” should be the driver when choosing a tool. Motion planning: problem definition is different depending on the task, control, uncertainty level, perception. Idea of Expected Utility Sampling. Deep Learning with algorithmic priors: combine data-driven methods with algorithmic prior, no reason to get stuck on one particular side. Differentiable Particle Filter for localization.

Sample-efficient Reinforcement Learning via Difference Models

Small humanoid robot learning to walk via RL. Exploration via random noise breaks it in 5 minutes (with a pre-programmed controller it breaks in 8 hours). Damage is unavoidable. On real robots those failures prevent from using RL in real world. What to do? First step is to learn in simulation, however even for just walking these differences are too big. Lets learn the difference between simulation and real world! They use DDPG. Learn model in simulator -> use real robot and run the policy to collect training data -> use this data to train difference models -> learn new policy that accounts for difference in dynamics. The difference between sim and real is postulated as \(M: (state_{k}^{true}, action_{k}^{true}) \rightarrow (state_{k+1}^{true} – state_{k+1}^{sim})\).

Workshop on Informative Path Planning and Adaptive Sampling

Data-driven Planning via Imitation Learning by Sanjiban Choudhury

Real-world motion planning problems. He worked on helicopters and quadrotors, has lots and lots (700 h) of flying data. Usual planning algorithms spend time checking the paths that are clearly not suitable. Adaptive planner is needed. Adaptation can be inside or outside the planner. Outside: for example ensemble of planners — quadrotor flied 16 km autonomously avoiding obstacles. Inside: learning a heuristic policy.

Quadrotor with the task to maximize exploration to map indoor space. Applied to inspection of power lines and underground tunnels.

Part 1: Search based planning

BFS tree expansion to find goal-node. Tries out all suitable expansion nodes. Can this be replaced by a policy that is smarter about picking next expansion node? Issues are: requires a lot of online effort, sample inefficient. Imitation learning in context of MDPs, three paradigms:

Open loop demonstrations. Usual behavior cloning. Compounding error is the problem, agent ends up in unknown state.
Corrective feedback. Adapt to ever-changing goal. DAgger is an example.
Cost-to-go feedback. AggreVaTe (Ross and Bagness, 2014). Similar to DAgger, but uses rollouts to change loss functions (?). They work in this paradigm. AggreVaTe \(\rightarrow\) SaIL (CoRL’17). Hallucinating oracle computes posterior over states. Compared to A* approach SaIL is 100x faster by rolling much less nodes to expand.

Part 2: Informative path planning

Nodes are sensing location where robot receives measurements. Each node has usefulness measure (how much info about the world was revealed). Goal is to maximize the utility given the exploration budget. They formulate this again as POMDP, action is the next node to visit, state is list of visited nodes and world, reward is the utility.

Marine Robotics: Planning, Decision Making, and Human-Robot Learning

Human input to improve path planning. Usual way for operating underwater vehicles is to manually define waypoints for exploration, receive data, come up with new waypoints, etc. Can we add more autonomy and better mechanism of interaction with humans?

Given initial trajectory provided by a scientists optimize it using under constrains of deviation and risk functions.

It seems that from scientist’s perspective deviation from the initial trajectory can make the whole run pointless. In my mind it is better to invest time to be able to autonomously avoid risks, but still get the measurements the scientist wanted.

Long-Term Autonomy and Deployment of Intelligent Robots in the Real-World

Place recognition through image sequence matching

Image sequence matching can be formulated as graph search problem.

One Map to Rule Them All? by Paul Newman

One algorithm for mapping without GPS, works in forest, field, underground mine, airport. Very strong autonomous driving group, read more: http://ori.ox.ac.uk/how-robotcar-works, http://ori.ox.ac.uk/theme/localisation, http://ori.ox.ac.uk/theme/mapping, http://ori.ox.ac.uk/media.

Learning Scan Context toward Long-term LiDAR Localization

Localization as classification with CNNs. Each place has an index — class (does not scale?).

Probabilistic Observation Maps for Use in Long-Term Human-Robot Interactions

Cluttered environments.

Long-term Large-scale Mapping and Localization Using maplab

A new framework for long-term mapping and localization. Open source.

Long-Term Deployment of Self-Driving Cars and Trucks (Uber)

Self-driving trucks decrease the haul cost, increasing demand, creating more (support) jobs. Uber uses all sensors: LiDAR, radar, cameras; no sensor is perfect. Uber has rich simulations and real-world test tracks to test all possible scenarious, reuse collected data to update intelligence, etc. Systems to find out “interesting” situations in the data they collect: operators mark those, but also automatically: when vehicle braked hard, differences between human driving, failed predictions on offline data. Important directions to work on is to estimate and use uncertainty. Multisensor seems to be more important than limiting to one, constrained case.

DejavuGAN: Multi-temporal Image Translation toward Long-term Robot Autonomy

Predictions of appearance changes. One unified network (GAN) to transform from one weather condition to another.

Efficient Map Management Scheme for LiDAR-based Vehicle Localization

Convert heavy point clouds to images for building maps from LiDAR scans. Image patches are enough for localization.

Long-Term Autonomy for Self-Driving Cars: Challenges and Opportunities by John Leonard

Life-long work on SLAM. In terms of deep learning the difference between robotics and pure computer vision is that achieving 90% is still bad and cannot be applied in the field. Levels of autonomy are defined for self-driving , what about other areas? “Guardian” system at Toyota — detects when human falls asleep or distracted and takes over.

A Trusted Goal Reasoning and Planning Framework for Long Term Autonomy

Underwater vehicles with 6-month deployment times. Avoiding environment changes and failure modes (low battery).

Deploying Mobile Robots and Keeping them Autonomous by Joydeep Biswas

Their research is aimed at providing methods for non-technical users to adapt and correct robot behavior to deploy robots for long-term autonomy. https://amrl.cs.umass.edu/index.php?id=projects. Human-in-the-Loop SLAM: https://www.joydeepb.com/Publications/aaai2018_hitl-slam.pdf.

Designing for Long-Term Autonomy: Experiences With Collaborative Robots by Jonathan Kelly

STARS Laboratory in Toronto. Mobile manipulator with UR-10. Touch sensor to build surface point cloud.

One Year of Autonomy in Everyday Environments: The STRANDS Project Tomas Krajnik

4D mapping, where an additional dimension is time: a probabilistic model to predict whether an object (or a voxel) is on the map at every specific time. Nicely models periodicity in the environment: people being present at their work places, people at cafeteria, etc.

EXPO


	Farming robot	Two nuclear detection submarine bots and a nuclear coffee thermos.
All Australian robots look like spiders. Of course they do.
Flying car. Nope, does not fly yet.	LiDAR survey drone	A more intuitive control interface

Day 1

Robots and People: the Research Challenge by Rodney Brooks

iRobot, Rethink Robotics. Number or elderly will be much greater. Urbanization. Climate change. P-values have to stop.

Learning Modes of Within-Hand Manipulation

Another realization of the idea to have a classifier to predict an event (in this case whether an object will fall from a grid) from robot state.

Optimization Beyond the Convolution: Generalizing Spatial Relations with End-To-End Metric Learning

Train supervised network to understand which scenes are similar in terms of how two objects are positioned relatively to each other. The network takes 3 input images, one is a reference, second is similar positioning to the reference and third is different from the reference. The system is trained with dual loss to output +1 for the pair of similarly positioned objects and -1 for differently positioned ones. Works to some extent.

Machine Learning for Safe, High-Performance Control of Mobile Robots by Angela Schoellig

Trajectory tracking task with real robots. Enhance capabilities of mobile robots. Whenever we can solve a task without learning — we do it, it is much easier. But there are more difficult problems, for which it is hard to find a mathematical model. Examples: fly a robot to slalom between poles, high speed driving. Without learning did not work. Safety through defining uncertainty envelopes, allows to drive faster once there is less uncertainty.

Talk Resource-Efficiently to Me: Optimal Communication Planning for Distributed Loop Closure Detection

See http://acl.mit.edu for a 2x more efficient information exchange in multi-robot setting. In the situations where we need to send point clouds it could make a huge difference.

Getting a Grip on Robot Grasping: Metrics and Protocols for Reproducibility by Ken Goldberg

In grasping for 35 years, actually… not so much progress.

LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes

How to make our own segmentation data for robotic manipulation training. Take RGBD video of a scene. Map object in video. Now you can split video into frames with objects labeled. They have generated a dataset with 350k images and 1M objects. Others can use the pipeline to create own domain-specific datasets.

Robotics Startup Competition

Purple Robotics Drones for work. Up to 90 minutes flight! 30+ minutes with 1.5kg payload.	HEBI Robotics Modular hardware and software platform.	EyeSight Tactile display for blind people.
aubot Modular telepresence bot.	Anybotics Legged robotis for dangerous environment, sensors for gas and stuff, www.anybotics.com

Day 2

Deep Reinforcement Learning for Navigation by Raia Hadsell

How mice navigate. Place cells and grid cells. Grid cells firing forms hexagonal grid. Grid cells emerge from training LSTM to do path navigation. First learns from velocities. Next they add CNN, put into RL setting and train to reach a goal in a maze with A3C.

RNN state readout will look similar for similar states. How much the spatial grid pattern that was formed is a property of the pattern a maze has — the mazes it was trained on are squares with rooms positioned in a grid pattern. Would grid-like structure emerge if the training environment would consist of mostly straight corridors or open planes?

MERLIN — Memory Enabled Reinforcement Learning uses Differentiable Neural Computer for memory.

But all of these experiments run in simulator. How these results are applicable to real world? Example of learning city map graphs. Mention of progressive nets for transfer learning from one city LSTM to another city.

End-to-end Deep Learning for robots (Vincent Vanhoucke). Deep Learning transformed speech recognition (2010), vision (2012), translation (2014) from modular approach into end-to-end. Is robotics next?

Where to Look? Predictive Perception with Applications to Planetary Exploration

Needs of Mars Curiosity robot. Radiation-hardened computer on Curiosity is slower than Raspberry Pi, processing is slow, manual driving is faster even from Earth.

Design and Analysis of a Fixed-Wing Unmanned Aerial-Aquatic Vehicle

Vehicle that could swim, fly and be able to transition between the mediums would have interesting applications. Works.

MLearning to Learn by Pieter Abbeel

Deep Reinforcement learning has achievement. How about speed of getting there. Fast reinforcement learning can be achieved by meta learning?
Pieter’s belief is that RNN, by combining data and compute, are the best way to learn how to learn in new environment. Or adapt to them.

Imitation learning. Has lots of successes in robotics. Model-Agnostic Meta Learning (MAML).

Use of simulation. Building realistic simulators is quite hard and compute-expensive. Idea of domain randomization: use simple simulator, but with high variability of shapes, colors, textures; as it learns to generalize we hope that the real world will just be yet another “weird” texture once presented to the agent. Extension of the idea to grasping: train on randomly generated meshes and then real world is just yet another set of randomly generated objects.

Lifelong learning — agents keep learning after it is deployed. RoboSumo (Al-Shedivat, 2017): adaptive robot learns to win against initially stronger opponent.

Learning learning allows to discover algorithms by compute and data rather than human ingenuity, which is a more limited resource and even more so with time relatively to data.

PRM-RL: Long-Range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning

Need to take a closer look.

Aggressive Flight with Suspended Payloads Using Vision-Based Control

Estimate hanging payload trajectory and manipulate it via movements of the drone.

Form Building Robots to Bridging the Gap between Robotics and AI by Sami Haddadin

Franka Enika. FOPNet — combining fundamental laws of physics with ANN learning for robotic arms worked, pure ANN was not able to capture the dynamics.

The UNAv, a Wind-Powered UAV for Ocean Monitoring: Performance, Control and Validation

Albatross (the bird) can fly amazing distances, 1000 km per day. How it does that? It is a wind-powered system. Idea: make a sailboat with wings and minimal contact with water.

Design, Modeling and Control of Aerial Robot DRAGON: Dual-Rotor-Embedded-Multilink Robot with the Ability of Multi-Degree-Of-Freedom Aerial Transformation

3D transformation while flying! Can change its shape while in air.

Accurate and Adaptive in Situ Fabrication of an Undulated Wall Using an On-Board Visual Sensing System

Robots in construction at the construction site. Humans are heavily in the loop.

Day 3

A New Perspective on the Birds and the Bees: Biologically Inspired Aerial by Mandyam Srinivasan

Biorobotics. They explore how bees fly with high speed cameras and try to replicate that in a flying robot. Gaps, landing, etc. For distance estimation and navigation optic flow plays a big role. Their plane that works on those principles flies and lands nicely. How to detect movement while you are moving yourself? Analyze optical flow and objects that are not following projected vectors must be moving.

Overcoming Exploration in Reinforcement Learning with Demonstrations

Localizing in Tricky Spots by Paul Newman

Works on autonomous driving. Local navigation over global coordinate frame. Way of de-shadowing image when converting it to grayscale from RGB. Learning from experiences by learning transformations between different weather conditions using CNNs: day, night, rain, snow, etc. Separate GAN for every transition for now. Same approach to deal with rain on lenses etc. “Ephemerality mask” to mask out moving objects. What if vision does not work at all? mmWave Radar.

Put-in-Box Task Generated from Multiple Discrete Tasks by a Humanoid Robot Using Deep Learning

OptLayer – Practical Constrained Optimization for Deep RL in the Real World

Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling

Robowars Brisbane

Satellite event “Robowars Brisbane 2018” provided a good explanation why do people actually do all of the stuff above: