Conclusion

  • The goal for this mini project is to investigate how different factors like GPA, GRE, Major, Gender, Domestic/International, etc. weight during admission process. During exploratory data analysis, feature selection and predictive model analysis, there are some enlightening relationship arised:
    • Candidates who received extension for offer of admission have an overall higher GPA standing than those who did not receive extension.
    • Factors that does not significantly contribute to the process are Gender, Dom_int and Rank, which also make sense. Since most of the admission process now are much more fair in terms of gender than before, where each program/institution will accept similar amount of students from both gender, so gender does not weigh much here.
    • Whether candidates are domestic or international students also does not matter a lot, student from all around the world seem to have an equal chance of being both admitted and valued more than their peers. 
    • Rank of undergraduate institution also doesn’t matter, because for graduate school each program or department have they own standards of choosing students. Candidates from higher ranked undergrad institution may not have the right background for a particular program, thus here ranking does not weight much.
    • Interestingly, GRE which presumably can be one of the most import features also has a low Mean Decrease Accuracy. The reason could be when GRE score does not vary a lot, Admission Committee will also look for extra curriculum activities or other aspects of the students (such as working/intern experience, whether published paper in their field of study, etc), this makes GRE not the strongest determinant here. 
  • From the predictive model and the feature Importance plots of Random Forest, the factors that weigh most to Admission committee are GPA, Major and TOEFLcut, where the prediction accuracy rate is above 78%, indicating good accuracy.
  • Two of the factors are standardized test results, which can be expected, since the test results are one of the few standard criteria that proves the candidates having strong learning abilities and desired basic qualifications which Admission faculty is looking for. 
  • And the other factor is Undergrad Major, this is an interesting factor, since even though from the hierarchical cluster result: there is no strong evidence showing that the same major share common traits, from the test result, Major does play an important role during the admission process. Some majors can be more preferred by Admission faculty and implying other aspects of the candidates which are not collected in this dataset.
  • In all, GPA, Major and TOEFLcut weight morer than other factors during admission process. And these are also the traits that make candidates stand out from their peers.

Future Work

    • Since the data is synthetic, there is limited amount of information that can be extracted.
    • For more detailed results, it would be great to find more information about each attribute. For instance, figure out what section does the GRE score represent, verbal or quantitative; detailed College names, etc.
    • More interesting factors can be added to make the prediction model more robust, such as whether the students have previous working experience, scholarship winning experience, or have participated and contributed in various interesting projects. 
    • Can also investigate which factors affect Matriculating, OfferOfAdmissionExtended, GRE, etc., and run the logistic regression model again to see the classification results.
  •