Chapter 6 Conclusion

The US Chronic Disease dataset consists of data from 2010 to 2022 for individuals affected with Chronic Diseases in the US.

Key Takeaways:
1. We have modified the data by considering years from 2018 to 2021 to represent the COVID period, the Data Value Type has several categories hence, we have chosen number to represent the population.
2. Through missing value analysis we were able to identify column-wise and row-wise patterns among missing values in our dataset. This allowed us to make a more informed decision about the columns that are useful for further analysis and the columns that needed to be dropped. After transforming our data, we have focused on the top 5 diseases to understand the spread of diseases based on gender and that on different states.
3. Based on analyzing the topic distribution on the location, we were able to identify the top 5 states that have witnessed highest population affected with Chronic Diseases, which are Texas, New York, California, Ohio and Florida.
4. We showed hierarchies in d3 charts to understand the data interactively by digging into it . This gives us the spread of the data just on the click of a button


Limitations:
1. The geolocation points for latitude and longitude are misleading and that needs to be optimized.
2. Data collected has many missing values that forces us to drop some features that might help in obtaining better insights.


Future Implementation:
1. Data exploration gave us the opportunity to find the mortality in the U.S. The next step would be to identify the reason for high mortality in some states v.s the others. It would help us to understand how mortality ir linked to the lifestyle of an individual. It would involve the food habits of the individuals etc.
2. The next step would be to identify correlation among the questions. These questions tell specific information about the disease. How one disease can cause a ripple could help in analzying the same and finding in the dataset. This exploration can gives us the opportunity to come up with the policies which could have a potential impact on the diseases.
3. For instance, if we bring to light the reason for high diabetes caused due to factors such as being physically less active and over weight. We could make general public aware by publishing these findings to create awareness is the most important step in identifying the information.
4. We saw over the last decade the course hasn’t changed much. That means the problem is rooted to much deeper. We should try to explore more versions of this dataset to identify if there is some correlation from different data sources such as BFSS. Apart from mortality there are several other indicators accounting to 124. These could also be brought in to the picture to see correlation and provide reasoning for different values of the dataset in different variables.