EXPLORATORY DATA ANALYSIS- HAPPINESS INDEX 2021

Rakesh Sethu NP
4 min readOct 16, 2022

--

World happiness index mean the survey of state of global happiness , where survey is taken place based on criteria like Social support , life expectancy , corruption etc of every country.

Thank you for Aakash N S as I have learned how to approach for any data set through the course https://jovian.ai/learn/data-analysis-with-python-zero-to-pandas

Also Thank you Armonia for idea of your blog as i am too new in this blog things.

Lets Get started :

This is Kaggle dataset you can easily download and analyze . Also I will upload 2022 dataset by end of the year .

So we need import few libraries before importing dataset. Let Import those.

Now lets download dataset , I have csv file ready downloaded from Kaggle

Now this is done , let do some data cleaning . Below are steps involved in those

TO SEE FIRST 5 COLUMNS
TO SEE LAST 5 COLUMNS
TO SEE ROW AND COLUMNS
TO SEE DATA TYPES
TO CHECK FOR ANY MISSING VALUES

We don't need every columns for analysis . So taking only required columns.

Renaming columns as python doesn't recognize space between word in columns.

Result : Column renamed.

Now Let do EDA (EXPLORATORY DATA ANALYSIS) AND VISUALIZATION

We will be doing some plotting to see relationship between each parameters

Let’s begin by importingmatplotlib.pyplot and seaborn

Scatterplot between Happiness and GDP

Most of the Happy country are from Western Europe as their GDP is also High which is more than 8 in Happiness score . This shows that more the country earn in more the people will get employed and more happier people will be.

Followed By Western Europe, its Northern America with 7

In Middle it os Latin America like Brazil and few East and South Asian Countries and Commonwealth of Independent states like Armenia , Belarus, Georgia, Kazakhstan

Last is African countries like Cameroon , Ethiopia , Angola

There is an Outlier country in blue mark which we presume is Afghanistan.

Let us GDP based on region

For that we have to find sum of GDP of all regional indicator

I have used pie chart for visualizing GDP as this is traditional method and easy way to understand the GDP or any value.

As we can see surprisingly we can see most of the GDP contribution are from Sub Saharan Africa with 20.7%.

Followed by Western Europe , C & E Europe.

Another surprising thing is Northern America contribution seem less thats 3.1% least of all.

To Clear this please find the below no of countries in the region.

Above data clears that we have more countries in Africa to contribute to GDP i.e 36 and followed by Western Europe 21

As we can see Ladder score and GDP per capita is highly correlated as we have see in previous scatter plot. Which mean more the GDP is higher people happy in their country.

We can also see that Ladder / Happiness score is highly correlated with Social support , Life expectancy and less correlated to corruption which means those country has less corruption and people believe those very strongly.

Lets see AVG Corruption perception in each region

We can see Highest corruption is found in Central and Eastern Europe ,Second comes Latin America , South Asia.

Least corruption is found in North America , Australia and New Zealand followed by Western Europe.

Finally based on corruption let plot Corruption Vs Happiness score

We can see where there is low corruption , happiness index is high.

Western Europe is in Top in term of less corruption as per region , followed North America , ANZ.

Africa is having high corruption and less happy people.

Now you all have understood relationship between each paramaters . Lets have some questions for you.

Q1:Which is Top 10 and Bottom 10 Happy countries ?? Simple Question:)

Answer is simple .

Just by using head and Tail syntax.

Q2: Which region has highest freedom of making life choices ?

Same as we discussed before Western europe leads in this parameters too and least is africa

Q3: Simple Question What is Average Happiness score ?

Q4: Which region has Highest life expectancy?

Again Western Europe Leads in Life Expectancy and Africa in the Bottom

Q5: Final Question — How do you plot for top 10 Countries based on corruption?

You see! Singapore has less perception of corruption followed by Rwanda and Highest is Ireland

Inferences and Conclusion :

Now Inference i see is …

Happiness Index gives us perception as below :

A: How a country in administrated ?

B: How people have opinion on the government?

C: People’s health and lifestyle of that particular country

Also from dataset we can infer that Western Europe lead in most of parameters like Life expectancy , GDP per capita . If you have seen last year i.e. 2021 you can see most of the Happiest countries in Top 5 are from Europe . Attached Link for reference in reference columns

At last I want to conclude that more the govt work for their country more the happier people will be and eventually ladder score will high

References and Future Work

Pandas for Data Manipulation https://pandas.pydata.org/docs/reference/index.html#api

Seaborn for Advance Visualization
https://seaborn.pydata.org/tutorial.html

Matplotlib for interactive visualizations https://matplotlib.org/stable/tutorials/introductory/pyplot.html

Statista For finding top Happiest countries https://www.statista.com/statistics/1225047/ranking-of-happiest-countries-worldwide-by-score/

About Mine:

My Name is Rakesh. I am from Non technical background but at same time I love working with data . I am going to work more and more on data .Eventually going to learn Machine learning, Deep learning as well.

You can go to visit my Github for my work and linkedin profile to know more about me.

https://www.linkedin.com/in/rakesh-sethu-n-p-6621b327/

Thank you all . Keep smiling always. Keep analyzing :)

--

--

Rakesh Sethu NP

Senior Procurement Executive |Data Analyst | Expertise in Excel, Power BI