This article delineates the Analysis of Meteorological Data (Finland’s Weather data) in order to test the Null Hypothesis. The analysis is supported by germane visualizations using Matplotlib library and several conclusions are drawn with the help of these visualizations.
The goal of this analysis is to test the Null Hypothesis H0 that states, “Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”.
The H0 means we need to find whether the average Apparent temperature for the month of say April starting from 2006 to 2016 and the average humidity for the same period have increased or not. This monthly analysis has to be done for all 12 months over the 10 year period.
The dataset has hourly temperature recorded for the last 10 years starting from 2006–04–01 00:00:00.000 +0200 to 2016–09–09 23:00:00.000 +0200. It corresponds to Finland, a country in Northern Europe. Source URL: https://www.kaggle.com/muthuj7/weather-dataset
1. Importing Libraries and Loading the Dataset
This data analysis task is performed on Google Colab. The initial step is to import all the required libraries and the dataset (weather.csv).
2. Viewing the Dataset
This step focuses on describing and viewing the data in order to have a better understanding of the dataset. The head() function is used to return top n (here 4) rows of the data frame.
After viewing the data let’s describe it using the describe() method which views some basic statistical details like count, percentile, min, max, mean, std, etc. of the data frame. Following this, I have used the info() method to present a concise summary of the data.
3. Data cleaning
The data might possess some unwanted fields and errors which might lead to wrong results and misguided decision making. Thus, this process helps to improve overall productivity and accuracy.
A. Removing unwanted columns
To test the Null Hypothesis we only have to focus on two parameters i.e. Apparent temperature and humidity. Thus, I dropped all the other parameters that the dataset contained using the drop() method.
B. Check for Null values
Further, I checked whether there are any null values in our dataset or not.
Next, we convert the Time zone to +00:00 UTC.
5. Data Visualization
In order to have a more perspicuous depiction of the dataset and to draw out conclusions, I have visualized the data using the Matplotlib library.
Plotting the whole dataset in a line graph.
This monthly analysis has been done for all 12 months over the 10 year period.
After analyzing Finland’s Weather data through various plots and graphs following points can be concluded:
- Humidity has almost remained constant (minor changes) over the years 2006 to 2016.
- There is a change in the Apparent Temperature for all the months (from September to March there’s gargantuan variation and from April to August, there’s a tiny change).
- For the month of April, we observe that there is an abrupt increase in Apparent temperature in 2009, 2011, and 2016 while there is an abrupt fall in 2010, 2015. There are minor changes in the Humidity.
So according to the Null Hypothesis Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming, but we have observed that Apparent temperature and humidity not only increase but decrease as well across 10 years of the data. Thus, the Null Hypothesis is proven wrong.