Week 3: CST383 Introduction to Data Science

    This week was loaded with new material, I feel we already passed the introduction to Data Science... I feel I am getting comfortable in exploring and understanding a single continuous variable.

    df.info() and df.describe() made it very easy to understand data set before jumping into visualization, which made the plots easier to interpret.

    Comparing histograms, density plots, and box plots helped me see that the same data can look very different depending on how it’s visualized, and that each plot highlights different aspects of the distribution(Box plot still confusing though!). Treating skew with log, was interesting, felt like a trick!

    I learned how a probability density function (PDF) works, including the idea that probabilities come from the area under the curve, not the height of the curve itself. The 68–95–99.7 rule was especially useful. 

    Covariance shows how variables move together but depends on scale, which makes correlation more useful because it is standardized between −1 and 1.

    Working through examples helped me understand the importance of centering variables by subtracting the mean when computing both measures. And we explored the relationships between two continuous variables using scatter plots.

    Scatter plots make it easier to see patterns such as positive or negative trends, clusters, and outliers that aren’t obvious from summary statistics. 

    Random variables are used to model uncertainty and describe outcomes using probability distributions rather than single values. 

    The difference between discrete and continuous random variables and how concepts like expectation (mean) and variance describe their behavior. Random variables can be sampled from helped connect the theoretical definitions to real data and simulations.

    One thing that stood out to me was how important it is to think about why I choose a particular plot, not just how to create it, since visualizations can be very sensitive yet very effective. I’m also still working on building an intuitive understanding of variance in different contexts, especially when comparing random variables that are on different scales.

Comments

Popular Posts