Replicating the Visual Pushing and Grasping paper

Reading Time: 4 minutes



I’m working on a small summer research project, which I will detail if I end up getting it working in full.

The idea is heavily based off of the tossing bot paper, as I liked the idea of combining a physics baseline with learning of the error (the residuals).

My requirements were:
1. can be finished in 3 months starting from scratch
2. has cool demo (to, for instance, a 10 year old maker faire attendee) — so probably something dynamic, movement-wise
3. research worthy, since my qualification trials are at the end of the summer.

I think I’ll struggle most with the last point, but I’m hoping that in the process of working toward my goal, I’ll think of something that could be tweaked or improved.

Anyway, although the code for the tossing bot paper is not available yet, the same authors released a nice, well commented / documented code repository for their earlier paper, the visual pushing and grasping paper. (I guess, it seemed like they completed part of it during a google internship, so I feel better that I’m being paid far less and cannot spend much time on releasing quality code).

And I actually got it to work! Wow, replicable work. Okay, so I didn’t get it to work in full — but I do have a vastly simplified version of their code working on my ur5, with a d415 camera, and a different gripper — and by using their pre-trained model out of the box. ! It outputs grasp predictions, and the ur5 moves to different locations where there are actually objects, picks them up, and then drops them.

I had to solve a few issues to get to this point, so I’ll outline here and explain in more detail later (hopefully — again time is short).

On the one hand, now I feel encouraged. On the other hand, it took me from June 11 until now (June 22) — a full 10 days — to get it a really mangled version and running. And on a 3 month timeline, I really feel that should have been closer to 3 days…

Sigh. But now I know how to work with the ur5’s and the grippers, and talk to it over both straight up python and ROS. So maybe I’ll be employable in the future.

Relevant links:

What I did the last 10 days:


  1. Installed 18.04.1 on the lab computer
  2. Installed ROS

Re: ROS, I also learned a hard lesson — checkout the right branch for your ROS packages. e.g. Kinetic Karma or Melodic whatever.  Otherwise will get a ton of errors.


1. Attached robotiq gripper to the robot arm, and got it functional.
1a. Required low profile screws of a short length (8mm) that I couldn’t find in the lab at first.

1b. Got it working directly with the teach pendant.
1b. There is a serial to USB converter which for me happened to be inside the ur5 control box. I unplugged that and plugged it into my desktop (presumably, you could control the gripper directly from the ur5 interface when it’s plugged into the ur5 usb ports).
1c. Got it working with ROS. To be hones, this was a majoorrr pain. I kept getting all sorts of weird errors.
ow, instead, I talk to it directly in python, bypassing ROS entirely. Read the robotiq manuals which give a clear command example.

Relevant links:
(mostly, just something like ser.write("\x09\x03\x07\xD0\x00\x03\x04\x0E") )


1. Attempted to install realsense-viewer on my ubuntu 19.10 install. Apparently the deb install only works with a much older version of the linux kernel — thus, started patching things and compiling from source. Did things like patch the patches, since the patches were for 18.04.2 and not… 19.10… I did get it working, but my main lesson was to install 18.04.1 on the ur5 desktop.

Relevant links:

  • Debug log
  • Start here
  • Fail, start to compile from source
  • See patch script files



1. Plugged it in
1b. Major lesson: Pendant shows coordinates, the ones in VIEW are different the ones which are reported over serial / you send via python. Have to use dropdown to select BASE.
Additionally, there are two ways to specify configurations which can not be directly mixed and matched. joint config = angle of each of the 6 joints. And the other one is the coordinates (which presumably ur5 has a built-in IK solver and path planner to move to), but note that the final tool position is in axis-angle coordinates, not in rotation of each joint!!! This was super confusing to debug.
2. Learned to use ur_modern_driver and get working; ignore the other package.

VPG code

Ah, maybe I’ll save that for next time…
I mostly fussed around with the script for a long time, an entire 1-2 days wasted on the fact that I didn’t realize the pendant coordinates were off by 40 cm on the z axis, so combined with the joint config vs position specification issue, I was confused why the robot was constantly trying to go through the table. I suspected it was something like the z axis issue, but really it was using this library to get the pose out
(such a great library!) that helped me figure it out.

Additionally, the tool offset I wasn’t certain how it worked, until I opened the code. I thought it was literally to where I wanted on the gripper to be the centerpoint, but no, it’s literally to what the UR5 thinks is the centerpoint of its tool, which is what it reports the coordinates of.

I’m currently still having some z-depth issues, so trying to work through the very detailed! parameters given in the paper to see what is going on with that.

And for now, took out all the heightmap rotation stuff, so it’s just a straight up and down grasp. Also had to hardcode in some workspace limits. There’s something wonky going on with the exposure settings on the realsense too.

OH and there was that 2 hours I spent figuring out that my extension cable looks like a USB 3 cable (blue ends, extra pins) but was behaving as a USB 2.0 extension cable… ordered some off of amazon that did the trick (also lsusb -t was very helpful).

Anyway, here’s a video of what it’s doing for now (I’ll rehost onto youtube for longevity when I get the chance)
And a more exciting dynamic maneuver

And pictures

Yesterday, when it was kinda working

Hey look, I selected BASE. T__T

Calibration in progress. With some limits to the movel command, punctuated by “I guess it’s safe *shrug*:

And a blurry picture of my lab. Had to crop out my robot a bit to avoid faces.

Until next time, folks. Hopefully I’ll have a working demo of something of my own soon. Right now, just running a mutilated version of someone else’s code. But happy to working with actual robots again.

projects blog (nouyang)