Security challenges in big data Research Topic

Research Project Motivation

In the current technology world, the large amount of data is daily generated at high rate from various heterogeneous sources such as government, marketing, social networking, financial and health (Bello-Orgaz Jung & Camacho. 2016). These data for collected from the various technological trends i.e. Internet of Thing devices etcetera. On the other hand, distributed application and power systems are supporting these connection systems, for example, the smart grid system, government system, and retailing system. Previously, when companies did not have the big data technology for adoption, they could not store their data and archives for long time period because previous technologies can only provide limited storage, less security and they were expensive as well. Hence, all these limitations of lack of scalability, low performance and flexibility are required in big data context. Although, big data technology is an emerging technology with high-performance rate big data also required new methods, significant resources, powerful technologies (Ge, et. al., 2018). Further, big data also required, secure, clean, analyze and provide granular retrieval including data sets.

Industries, enterprises, and companies are more aware that data evaluation is slightly increased and it is becoming a vital factor that needs to be competitive and also to personalize services. As the big data enabled users to extract interesting value, various projects have been launched in the various areas. Big data technology has become the hot topic in today’s world because it contains high amount of data. The majority of data stored, as big data are images, audio, texts, videos etc. and these data are hard to manage and analyze with precise and less time period. The Big data is implemented due to its ability to store semi-structured, structured and unstructured data. There is number of challenges associated with big data technology but the main challenge in the big data is processing and storing the data in specified time-period, as the big data is huge for any organization that is distributed on the number of machines.

Towards those 2.5 Quintillion bytes of data are generated every data that produced from various smart devices, GPS signals of mobile phones, purchase transaction records etcetera. Continuous data generation increases the complexity of storing and processing the data.

Research Questions

The research is aimed to classify various big data challenges and its mitigation processes to provide ability to store and process the data gathered from the various devices all around the world. Following are the research questions to make the research more domain centric and route it to the root of big data –

  • What is the figure of the role and importance of big data in corporations, businesses, and daily life usage? Make focus on the Big Data storage and management to provide its metrics and de-metrics?
  • What is the focus of research and also provide overview of Big data with its security issues in various sectors? Further, provide what are the issues that are faced by the big data users while using it?
  • Provide most effective and reliable methods that could reduce the complexity of Big data storage and processing. Also, highlight the components, which are highly reliable.
  • Big data technology has become most recognizable technology in daily life. Provide reason behind it and name the features of big data technology?
  • List the security challenges that need to be focused on enhancing the efficiency and security of Big data technology?

Thesis Description

It is hard to define the big data but defines that big data is a complex and large data that is hard to use the traditional tools to manage, store and analyse in precise time period. Hence, the big data required in advance and new processing model that has the ability to better storage, decision-making and better storage (Kobusińska, et. al., 2018). The big data technology enabled users to interact, extract and analyse big data. In the other worlds, there is an increase of data if it is three-dimensional. These are variety, volume, and velocity. The volume of big data processing exists from Terabyte (TB) level to Petabyte (PB) level. Here the second dimension is variety. There were various types of the data, where unstructured data is increasing continuously and example of unstructured data is weblogs, video, audio, locations, and pictures. Big data eliminated the complication of storing data in traditional ways such as warehouses or table based databases. After two dimensions, velocity is introduced that is the main feature to distinguish the traditional data mining and big data. In the history of Big Data, volume of submission data and access to data are huge. Velocity of responses occurs at the time when the user submits their request to the data server and the speed of response should be fast instead of letting the user wait for long time. It has proposed that the rate of valuable data is inversely proposed to the volume of total data.

There are various thereat agents that always focused sensitive big data. Big data is insecure due to its sensitivity. The research is focused on the elimination of security threats.

Figure 1: Threat agents of Big Data

(Source: practical analytics, 2014)

In figure 1, all the threat agents are provided. These threat agents can attack the big data through various ways and devices (Quinn & Quinn, 2018). It is also mentioned that 70% of security executives are concerned about the cloud and mobile security.

Big data is always available online to be accessed and store the data. Increase in volume of data storing and processing on the daily basis, big data is not only big but it is actually getting online. Where online is exists only when it is connected to the end users. There are various challenges and limitation associated with the Big Data technologies. Currently, key challenges in exploiting big data are

  • Lack of knowledge in the organization to take the advantage of big data
  • Lack of talent in machine learning, data mining and machine learning (Carlos, et. al., 2018)

Above-mentioned limitation reflects that the existing technology is difficult and complex to understand and use. Big data technology is an emerging technology and it will surely take time to reach the level of easiness to use even in at the large scale (Elshawi, et. al., 2018). If Big data analysis used improperly then issues can be posed and especially in following sectors –

  • Data policies
  • Technique and technology
  • Access of data
  • Industry structure

These are indeed out of the focus of this research but these challenges should not be neglected because they are nontechnical challenges. There are several characteristics of problems of big data the make big data technically challenging (Oussous, et. al., 2017). To better understand the challenges of big data, further, it is classified in three different dimensions –

Data Challenges


On the daily basis, high volume of data exploding whether it is from machines or different smart devices. It is growing very fast as in the year 2000; approximately 800,000 PB of data was stored in the world (Torrecilla & Romo, 2018). This data is increasing on high volume rate and it is expected to reach ≈ 35 ZB by the end of the year 2020. Hence, this is a challenge that how to deal with the increasing size of Big Data.

Combining multiple datasets

In today’s technology world, more than 75% of information is unstructured and is hard to manage and process effectively, as unstructured data is hard to manage in columns and rows due to its complex structure (Zhao, et. al., 2018). When data comes from smart devices, social collaboration technologies and sensors then the data are structured semi-structured and unstructured data. Therefore, the challenge is how to manage multiplicity of sources, formats, and types.


In fast forwarding world, enterprises and organization want their data to be available whenever required or they want real-time analytics with the high user volumes (increase in number of people with the access) (Habib, et. al., 2018). Therefore, the challenge in velocity of big data is, how to effectively manage the flood of information with the real-time analytics.

Discovery of Data

Data discovery in a huge challenge in big data as it is hard to find high-quality data from the huge collection of data available at the web.

Relevancy and Quality

This challenge has occurred due to the issue of determining the quality of datasets and relevant to the issues.

Data Dogmatism

Big data analysis offers remarkable insights but the data user must be flexible to becoming too indebted to the numbers

Process Challenges

There are various challenges that come under the process of big data –

  • Data capturing
  • Alignment of data from various sources
  • Transforming the data for analysis in suitable form
  • Modeling the data
  • Understanding the visualization, output, and sharing of the results.

Management challenges

As the big data contains high amount of sensitive and personal data, there are various ethical and legal concerns for accessing the data (Tiampo, et. al., 2018). Therefore, the data should be access controlled and secure. The challenges that come under the management are

  • Security
  • Confidentiality
  • Data privacy
  • Ethical
  • Governance

Hence, the challenge is to ensure that the sensitive data are used correctly and tracking that how the data are transformed, used and derived and further managing the big data lifecycle (Gupta, et. al., 2018).

Above mention are the challenges and limitations of Big Data. The research is further focused on the elimination of data challenges. The data challenges are discussed above and it could be seen that volume, velocity, relevancy, quality etcetera are limitation that comes under the data challenges.


The research is focused on the Big Data technology and security issues. Previously, all the limitations are discussed and in this section methodologies that are having ability to provide way to overcome the big data challenges will be derived and they will be identified as well. Towards that, quantitative methodologies will be used due to its high accuracy and effectiveness. In the quantitative methodology, all the researchers are free to make study and research on the security challenges of big data (Carlos, et. al., 2018). Therefore, all the researchers are working on the findings and topics for their domain. The quantitative methodology allows researchers to understand the domain on which the research is conducted and guide them to write proposal ineffective quantitative and domain-centric words. Quantitative methodology also assists in reviewing the research paper. The research is also presenting the survey on Big Data and security issues.

Big data technology is mainly focused in the research along with its challenges and limitations. Further, the research is also aimed at technologies that are effective and adaptable to overcome the security issue of big data technology (Thorstad & Wolff, 2018). In addition, advantages of big data technology in the daily usage of users and organization are also covered.

To make the research more focused on Big Data technology, recently published research papers and theories have reviewed. Further, to collect the relevant data about the big data technology library, websites and books got analyzed. To make research more precise and advance, the latest published papers have checked.

There is a number of limitations and issues associated with the research topic, Big Data technology and to eliminate these issues, numbers of literatures are reviewed. Further, by using relevant literature reviews that are based on big data technology, many approaches are figured out that are reliable and effective towards eliminating issues of big data technology (Elshawi, et. al., 2018). Following are the features of quantitative methodology that help in adopting this methodology –

  • Identify the process – It is important to list out the limitations that are associated with the Big Data. Therefore, in the initial stage, big data challenges are identified and also the solutions that are having the ability to eliminate the same issues.
  • Literature Review – This section helps to identify and analyze various approaches that are used in the different kinds of literature. The review of several kinds of literature helps to evaluate and identify the challenges and advantages of big data analytics. At the end of the section, metrics and de-metrics of each literature are figured out.
  • State of art – In the final stage, the solution that is most effective and reliable among all the solutions that are listed in the number of works of literature is figured out. The state of art solutions is the solution that could help it overcome the focused limitation (Seles, et. al., 2018). In the end, all the solutions are synchronized to list out the number of effective solutions.

The section of methodology clearly provides information about the solutions to overcome the security issue that under the big data technology.

Research Schedule

Towards effective management of tasks and activities, following is the chart representation of various tasks and activities along with the GANTT chart representation –

Task Description Start Finish Complete Deliverable
Verbal presentation Mon 10-09-18 Tue 11-09-18 Yes Verbal presentation completed
Overview the literature based on cloud-based database Wed 12-09-18 Mon 17-09-18 Yes Overview phase completed
Proposal preparation Tue 18-09-18 Wed 19-09-18 Yes Proposal completed
Supervisor and specialist interaction Thu 20-09-18 Fri 21-09-18 Yes Proposal Updated
Finalize the proposal Mon 24-09-18 Tue 25-09-18 Yes Submitted
Best technology selection Wed 26-09-18 Fri 28-09-18 Yes Completed
Second verbal presentation Mon 01-10-18 Thu 04-10-18 Yes Submitted
Data gathering for review Fri 05-10-18 Mon 08-10-18 Yes Notes prepared
Prepare Literature Review Tue 09-10-18 Thu 11-10-18 Yes Completed
Consulting with superiors Fri 12-10-18 Wed 17-10-18 Yes Literature review updates
Literature reviews finalization Thu 18-10-18 Fri 19-10-18 Yes Submitted

Table 1 – Timeline to present the Research Proposal





Bello-Orgaz, G., Jung, J. J., & Camacho, D., 2016. Social big data: Recent achievements and new challenges. Information Fusion28, 45-59.

Carlos, R. C., Kahn, C. E., & Halabi, S., 2018. Data science: big data, machine learning, and artificial intelligence. Journal of the American College of Radiology15(3), 497-498.

Elshawi, R., Sakr, S., Talia, D., & Trunfio, P., 2018. Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service. Big Data Research.

Ge, M., Bangui, H., & Buhnova, B., 2018. Big Data for Internet of Things: A Survey. Future Generation Computer Systems.

Gupta, S., Kar, A. K., Baabdullah, A., & Al-Khowaiter, W. A., 2018. Big data with cognitive computing: A review for the future. International Journal of Information Management42, 78-89.

Gupta, S., Mateu, J., Degbelo, A., & Pebesma, E., 2018. Quality of life, big data and the power of statistics. Statistics & Probability Letters136, 101-104.

Habeeb, R. A. A., Nasaruddin, F., Gani, A., Hashem, I. A. T., Ahmed, E., & Imran, M., 2018. Real-time big data processing for anomaly detection: A Survey. International Journal of Information Management.

Kobusińska, A., Leung, C., Hsu, C. H., Raghavendra, S., & Chang, V., 2018. Emerging trends, issues, and challenges in Internet of Things, Big Data and cloud computing.

Oussous, A., Benjelloun, F. Z., Lahcen, A. A., & Belfkih, S., 2017. Big Data technologies: A survey. Journal of King Saud University-Computer and Information Sciences.

Quinn, P., & Quinn, L., 2018. Big genetic data and its big data protection challenges. Computer Law & Security Review.

Security Analytics – Big Data Use Case., 2014. Business Analytics 3.0. Retrieved 27 September 2018, from

Seles, B. M. R. P., de Sousa Jabbour, A. B. L., Jabbour, C. J. C., de Camargo Fiorini, P., Mohd-Yusoff, Y., & Thomé, A. M. T., 2018. Business opportunities and challenges as the two sides of the climate change: Corporate responses and potential implications for big data management towards a low carbon society. Journal of Cleaner Production189, 763-774.

Thorstad, R., & Wolff, P., 2018. A big data analysis of the relationship between future thinking and decision-making. Proceedings of the National Academy of Sciences115(8), E1740-E1748.

Tiampo, K. F., McGinnis, S., Kropivnitskaya, Y., Qin, J., & Bauer, M. A., 2018. Big Data Challenges and Hazards Modeling. In Risk Modeling for Hazards and Disasters (pp. 193-210).

Torrecilla, J. L., & Romo, J., 2018. Data learning from big data. Statistics & Probability Letters136, 15-19.

Zhao, Y., Zhang, H., An, L., & Liu, Q., 2018. Improving the approaches to traffic demand forecasting in the big data era. Cities.