Action understanding is an important cognitive faculty which can help robots efficiently encode, store, and retrieve observed human demonstrations. This is of great interest in cognitive robotics for creating memory units to encapsulate gained information from past experiences, which can be then recalled to adapt ongoing and future behaviors. We introduce a novel deep nural network architecture for encoding, storing and recalling past action experiences in an episodic memory-like manner. The network creates a low-dimensional latent space representation of the observed actions. Such a formulation in the latent space allows robots to classify different action types and retrieve the most similar episodes to the query action (see model figure). The proposed deep network further helps robots predict and generate the next possible frames of the currently observed action.