A basic skill for robots working in a human environment is to interact with object. This frequently requires them to deal with object they have not encountered before, i.e. which are completely unknown to them. To this end, they need to learn about these new objects, e.g. how they look, so they can later be recognized when they are encountered again. Other interesting properties are shape and weight which are particularly relevant for manipulation.
The task of learning the visual appearance of a new object for later recognition comprises two aspects. Obviously, it is necessary to generate a visual descriptor that allows unambiguous and efficient recognition of the object in camera images. But to create such a descriptor for an unknown object, it is first necessary to segment this object from its environment in the camera images of the robot. This is particularly difficult if both the object and its surroundings are unknown, or when there are several unknown objects standing next to each other.
We have come to the conclusion that the task of object segmentation can, in general, not be solved by purely passive observation. Therefore we propose to let the robot physically interact with the object. Inducing motion to it eliminates visual ambiguities and allows a clear and reliable segmentation. Based on the segmented object, visual descriptions can be elarned. Moving the object also reveils different sides of it, which allows the creation of visual multi-view descriptions.
Having segmented (and thereby localized) an unknown object, the robot can also grasp it. Most classical grasp planners require a complete and acurate 3D model of the object, which we usually do not have in reality, especially in the case of an unknown object. Instead, we develop reactive grasping approaches based on tactile and force sensor feedback from the robot hand.