Estimating hand-object (HO) pose during interaction has been brought remarkable growth in virtue of deep learning methods. Modeling the contact between the hand and object properly is the key to construct a plausible grasp. Yet, previous works usually focus on jointly estimating HO pose but not fully explore the physical contact preserved in grasping. In this paper, we present an explicit contact representation, Contact Potential Field (CPF) that models each hand-object contact as a spring-mass system. Then we can refine a natural grasp by minimizing the elastic energy w.r.t those systems. To recover CPF, we also propose a learning-fitting hybrid framework named MIHO. Extensive experiments on two public benchmarks have shown that our method can achieve state-of-the-art in several reconstruction metrics, and allow us to produce more physically plausible HO pose even when the ground-truth exhibits severe interpenetration or disjointedness.