Week 2: CST383 Introduction to Data Science

    This week felt packed, lots of new pandas tools and assignments came fast. It was challenging, but I’m starting to see how everything connects.

    Pandas clicked for me, Series and DataFrames make data much easier to work with than plain NumPy arrays, because of its labeled index. Selecting columns, filtering rows, and using .loc[], .iloc[], .drop(), and boolean masks felt way more readable than regular loops. 

    Aggregation was fun. .mean(), .min(), .max() are easy, but groupby() was the hardest part. breaking data into groups (age, gender, user type, stations) and comparing them feels super practical. I see why real analysis is always about categories, not just one big average. I think the syntax can gets weird though when multi index.

    Distributions were tricky at first. PDF vs CDF confused me, but CDF makes probabilities and percentiles so much easier to read. Skewness finally makes sense too.

    Random questions came to mind, how much data is actually “enough” to trust a pattern? If we keep adding more, does it always get better or can it mess things up? How do mistakes data scientists make affect real people’s lives?

    Overall, I feel much more confident reading CSV data, asking real questions about groups summarizing with counts/fractions/medians, and interpreting simple distributions. The bike-share assignment especially helped!

Comments

Popular Posts