Week 3: CST383 Introduction to Data Science
This week was loaded with new material, I feel we already passed the introduction to Data Science... I feel I am getting comfortable in exploring and understanding a single continuous variable. df.info() and df.describe() made it very easy to understand data set before jumping into visualization, which made the plots easier to interpret. Comparing histograms, density plots, and box plots helped me see that the same data can look very different depending on how it’s visualized, and that each plot highlights different aspects of the distribution(Box plot still confusing though!). Treating skew with log, was interesting, felt like a trick! I learned how a probability density function (PDF) works, including the idea that probabilities come from the area under the curve , not the height of the curve itself. The 68–95–99.7 rule was especially useful. ...