🌊Titanic Dataset Analysis 🚢¶

Let's dive into analyzing the Titanic dataset! We'll filter, transform, and explore different analyses to understand the factors affecting passenger survival. 🤓

1. Data Loading & Preparation 📂¶

  • The dataset is loaded from a CSV file.
  • Irrelevant columns are removed: SibSp, Parch, Name, PassengerId, Ticket, Cabin.
  • 👨‍👩‍👧‍👦 A new column FamilySize is created to calculate the family size (including the passenger). ## 2. Converting Categorical Columns to Numeric🧮
  • Sex is converted to numeric values: male → 0, female → 1.
  • Embarked is converted to numeric values: C → 0, Q → 1, S → 2.

3. Data Visualization 📊¶

Variable Distributions🔍¶

  • Histograms for continuous variables: Age, Fare.
  • Bar charts for categorical variables: FamilySize, Sex, Pclass, Embarked.

4. Survival Analysis 🧑‍⚖️¶

  • Analyzing survival rate based on:
    • Age (Age).
    • Ticket fare (Fare).
    • Family size (FamilySize).

5. Categorical Survival Analysis 🏷️¶

  • Comparing survival rates based on:
    • Gender (Sex).
    • Travel class (Pclass).
    • Embarkation port (Embarked).
  • Displaying the data using countplot.

6. Fare & Survival Exploration 💰¶

  • Using boxplot and histplot to analyze the relationship between ticket fare and survival probability.
  • Splitting the data into two groups:
    • Low-cost tickets (Fare < 50).
    • High-cost tickets (Fare >= 50).

7. Pie Charts of Survival Rates 🍰¶

Finally, let’s wrap it up with pie charts for survival rates by Age and Fare groups. 🥧


7. Correlation Heatmap 🔥¶


8. Missing Values 💔¶

  • Detecting missing values (NaN) using heatmap.
  • Graphical representation of missing data extent.

🎉 That's it! We now have a comprehensive Titanic dataset analysis, including visualizations and statistical insights. 💡