601 Recent Trends in IT 2 Marks Questions with Answer
1. What is data mining?
Ans. : A large amount of data is available in different industries and organizations. The availability of this huge data is of no use unless it is converted into valuable information. Otherwise, we are sinking in data, but starving for knowledge. The solution to this problem is data mining which is the separation of useful information from the huge amount of data that is available.
Data mining is defined as: "Data mining, also known as Knowledge Discovery in Data (KDD), is the process of uncovering patterns and other valuable information from large data sets".
2. What is data warehousing?
Ans. : Data Warehouse, also known as DWH is a system that is used for reporting and data analysis. Data Warehouse is a concept which supports decision support systems where a large amount of data is merged. A data warehouse is a repository which is at the top of multiple databases. It can be defined as a process for collecting and managing data from varied sources to provide meaningful business insights.
3. Explain Spark with features?
Ans. : Apache Spark has the following features:
Speed: The main feature of Spark is its in-memory cluster computing that. increases the processing speed of an application. Spark helps to run an application in Hadoop cluster, up to 100 times faster in memory, and 10 times faster when running on disk. Multiple language support: Spark supports multiple languages. It provides various APIs written in Java, Scala, Python and R.
Multiple platform support: Spark will run on multiple platforms while not moving the processing speed. It runs on Hadoop, Kubernetes, Mesos, Standalone, and even withinthe Cloud.
Advanced Analytics: Spark not only supports 'Map' and reduce'. It also supports SQL queries, Streaming data, Machine Learning (ML), and Graph algorithms.
4. What Is AI and Explain the application?
Ans. : AI is the science and engineering of making machine intelligent.
1. Gaming: Al plays a crucial role in strategic games such as Chess, Poker, Tic-TacToe, etc., where machines can think of a large number of possible positions based on heuristic knowledge.
2. Robotics: Robotics is a branch of Al, which is composed of Electrical Engineering Mechanical Engineering and Computer Science for designing, construction and application of robots.
3. Cognitive Science: It is the interdisciplinary and scientific study of human behavior and intelligence, with a focus on how information is perceived processed and transformed.
5. Define Search Strategy?
Ans. : :- The word 'search' refers to the search for a solution in a problem space.
Search proceeds with different types of search control strategies. A strategy is defined by picking the order in which the nodes expand.
So far, we have not given much attention to the question of how to decide which rule to apply next during the process of searching for a solution to a problem. This question arises when more than one rule will have its left side match the current state.
In search method or technique, firstly select one option and leave other option. If this option is our final goal, then stop the search else we continue selecting, testing. and expanding until either a solution is found or no more states to be expanded.
The depth-first search and breadth-first search are the two common search strategies.
6. Explain BFS, DFS,DLS?
Ans. : BFS (Breadth-first Search)
Breadth First searches are performed by exploring all nodes at a given depth before proceeding to the next level. This means that all immediate children of nodes are explored before any children's children are considered. This process is called Breadth First Search.
7. Expain DFS
Ans. : Depth first searches are performed by going downward into a tree as early as possible. Consider a single branch of the tree until it produces a solution or until a decision to terminate the path is made. It makes sense to terminate a path if it reaches a dead end, produces a previous state or becomes longer than some limit, in such cases backtracking occurs. To overcome such backtracking is known as Depth First Search.
8. Expain DLS
Ans. Depth limited search is the new search algorithm for uninformed search. The unbounded tree problem happens to appear in the depth first search algorithm, and it can be fixed by imposing a boundary or a limit to the depth of the search domain. The Depth Limited Search (DLS) method is almost equal to Depth First Search (DFS).
But DLS can work on the infinite state space problem because it bounds the depth of the search tree with a predetermined limit L. Nodes at this depth limit are treated as if theyhad no successors.
9. Explain Uniform Cost Search?
Ans. : Uniform Cost Search is a searching algorithm used for traversing a weighted tree or graph. This algorithm comes into play when a different cost is available for each edge. The primary goal of the Uniform Cost Search is to find a path to the goal node which has the lowest cumulative cost. Uniform Cost search expands nodes according to their path costs from the root node. It can be used to solve any graph/tree where the optimal cost is in demand. Uniform Cost Search algorithm is implemented by the priority queue.
10. Iteratitve Deepening Search
Ans. : The iterative deepening algorithm is a combination of DFS and BFS algorithms. This search algorithm finds out the best depth limit and does it by gradually increasing the limit until a goal is found. This algorithm performs Depth First Search up to a certain "depth limit", and it keeps increasing the depth limit after each iteration until the goal node is found. This Search algorithm combines the benefits of Breadth First Search's fast search and Depth First Search's memory efficiency.
11. Write Hill Climbing algorithm?
Ans. : The algorithm for Hill Climbing as follows:
Step 1: Evaluate the initial state. If it is a goal state then stop and return success Otherwise, make the initial state the current state.
Step 2: Loop until the solution state is found or there are no new operators present which can be applied to the current state.
(a) Select a state that has not been yet applied to the current state and apply it to produce a new state.
(b) Perform these to evaluate new state.
(i) If the current state is a goal state, then stop and return to success.
(ii) If it is better than the current state, then make it the current state and proceed further.
(iii) If it is not better than the current state, then continue in the loop until a solution is found.
Step 3: Exit
12. Write OLTP Databases?
Ans. : Online system OLTP Transaction Processing. It manages transaction oriented I applications. It is an online database modifying system. Its basic focus is on manipulating the database. The queries are short and simple. The modeling of OLTP is industry oriented.
The main purpose is to control day to day transactions in the database. Less Number of data accessed. Relational databases are created for online Transactional Processing (OLTP).
13. Write OLAP Databases?
Ans. : Online system Analytical processing. It manages the reports to multi dimensional analytical queries. It is an online query answering system. Its main focus is to analyze and extract the data for strategic decision making. The queries are long and complex. The design of OLAP is subject or domain specific. Its main purpose is to find the hidden data and support decision making. Large Number of data accessed. Data Warehouse designed foronline Analytical Processing (OLAP).
14. List Major Components of Date Warehouse System?
Ans. : CRM, Billing, ETL, Flat Files, Data Warehouse, Reporting, Data Mining
15. List Out Step of KDD (Kowledge Data Discovery) process?
Ans. : 1. Selection: The data which is to be mined may not be necessarily from a single source. The data may have many heterogeneous origins. This data needs to be obtained from various data sources and files. The data selection is based on your mining goal. Data relevant to the mining task is selected from various sources.
2. Pre-processing: Pre-processing involves cleaning of the data and integration of the data. The data selected for mining purposes may have some incorrect. irrelevant values which lead to unwanted results. Some values may be missing or erroneous. Also, when data is collected from heterogeneous sources, it may involve varying data types and metrics. So, this data needs to be cleaned and integrated for noise elimination and inconsistency.
3. Transformation: Data transformation is the process of converting the data into the format which is suitable for processing. Here, data is created in the form which is required by the data mining process.
4. Data Mining: The Data Mining process leads towards using methods, techniques to extract the pattern present in the data. The process involves transformation of relevant data records into patterns using classification. This step involves application of various data mining algorithms to the transformed data. This process generates the desired results for which the whole KDD process is undertaken.
5. Visualization/Interpretation: This is the last step in the KDD process. In this step, the data is presented to the user in the form of reports, tables or graphs. The presentation of the data to the users directly affects the usefulness
16. What is Prediction?
Ans. : Prediction is a classification task. Prediction discovers the relationship between dependent variables and relationship between independent variables. It can also be viewed as estimation. The prediction is based on the data in hand and predictions or future trends of a phenomenon can be predicted using some predictive algorithms. The best example of prediction is the profit that could be gained out of sale. Predication is the technique of identifying the unavailable numerical data for a new process. Prediction applications include flooding, speech recognition, machine learning, and pattern recognition.
17. Define Predictive Data mining ?
Ans. : Predictive data mining tasks include the prediction based on the available data set in hand. These tasks give the model based on data and predict the future trends related to that data or unknown values that may be of interest for the future. The example of predictive tasks includes the prediction of future value of gold according to the current market trend. Also, prediction of high or low value of a share in the share market based on its previous growth is also a predictive data mining task. Predictive data mining includes Classification,Regression. Prediction and Time Series Analysis
18. Define Descriptive Data mining?
Ans. : Descriptive data mining tasks include the analysis of available data patterns or models to find out new interesting and significant information based on available data set. The example of descriptive data mining tasks includes the interchange in places of the super market according to the purchase pattern of the customers. Descriptive data mining includes Clustering, Summarization, Association Rules and Sequence Discovery.
19. What is data integration?
Ans. : Data integration is the process of combining data from disparate sources into a meaningful and valuable data set for the purpose of analysis. In this step, a logical data source is prepared. This is done by collecting and integrating data from multiple sources like databases, legacy systems, flat files, data cubes
20. what is graph mining?
Ans. : Graph Mining is the set of tools and techniques used to:
(a) analyze the properties of real-world graphs. (b) predict how the structure and properties of a given graph might affect some application. (c) develop models that can generate realistic graphs that match the patterns found in real-world graphs of interest.
21. Explain Web Mining?
Ans. : As tremendous amount of data is being generated daily on the web, the mining of this data is very essential. Web mining refers to the mining of data related to World Wide Web.
This data contains the actual data present on web as well as the data related to web. Web data can be classified into following categories: Content of actual web page.
Inter-page structure containing actual linkage structure between web pages. Intra-page
structure containing HTML or XML code.
Web page access log.
User profiles.
22. Explain spatial mining ?
Ans. : Spatial data are the data about objects that are located in a physical space. This
includes the data related to space and including maps. Spatial mining is the process of
application of data mining to spatial data. In spatial mining, geographic or spatial information
is used to produce the results. In Spatial mining the extraction of knowledge, spatial
relationships, or other interesting patterns stored in spatial databases is done. The
application of spatial mining is for learning spatial records, discovering spatial relationships
and relationships among spatial and non-spatial records, constructing spatial knowledge
bases, reorganizing spatial databases, and optimizing spatial queries.
23. Explain Temporal Mining?
Ans. : A temporal database stores data relating to time instances. Temporal Data Mining is
a single step in the process of Knowledge Discovery in Temporal Databases that
enumerates structures (temporal patterns or models) over the temporal data, and any
algorithm that enumerates temporal patterns from, or fits models to, temporal data is a
Temporal Data Mining Algorithm.
Temporal Data Mining often involves processing time series, typically sequences of data,
which measure values of the same attribute at a sequence of different time points.
24. Difference Between Verification & Discovery?
Ans. : Verification : 1. It takes a hypothesis from the user and tests the validity of it against
the data. 2) The emphasis is with the user who is responsible for formulating the hypotheses
and issuing the query on the data to affirm or negate the hypothesis. 3. No new information
is created in the retrieval process. 4. The search process here is iterative in that the output
is reviewed, a new set of questions or hypothesis formulated to refine the search and the
whole process repeated.
Discovery : 1. Knowledge discovery is the concept of analyzing large amount of data and
gathering out relevant information leading to the knowledge discovery process for extracting
meaningful rules, patterns and models from data. 2. The discovery model differs in its
emphasis in that it is the system automatically discovering important information hidden in
the data 3. The discovery or data mining tools aim to tell many facts about the data. 4. The
data is shifted in search of frequently occurring patterns, trends and generalizations about
the data without intervention or guidance from the user.
25. List the software used for data mining?
Ans. : R is an open-source programming tool developed by Bell Laboratories. R is a
programming language and an environment for statistical computing and graphics. It is
compatible with UNIX platforms, FreeBSD, Linux, macOS, and Windows operating systems.
R is popular for data mining as it is used to run a variety of statistical analysis, such as timeseries analysis, clustering, and linear and non linear modelling. R also supplies excellent
data mining packages. Overall, R also offers graphical facilities for data analysis. The
applications of R also include statistical computing, analytics, and machine learning tasks.
Weka
Weka is a collection of machine learning algorithms for data mining tasks. It is open-source
software that provides tools for data pre-processing, implementation of several Machine
Learning algorithms. The algorithms can either be applied directly to a data set or called
from your own Java code. Weka contains tools for data pre-processing, classification,
regression, clustering association rules, and visualization. It is also well-suited for
developing new machine learning schemes. Weka is comprehensive software that lets you
pre-process the big data, apply different machine learning algorithms on big data and
compare various outputs. This software makes it easy to work with big data and train a
machine using machine learning algorithms
26. What Is data cleaning?
Ans. : The first step in data pre-processing is data cleaning. It is also known as scrubbing.
Data cleaning includes handling missing data and noisy data.
(a) Missing data: Missing data is the case wherein some of the attributes or attribute data is
missing or the data is not normalized. This situation can be handled by either ignoring the
values or filling the missing value.
Noisy data: This is data with error or data which has no meaning at all. This type of data
can either lead to invalid results or can create the problem to the process of mining itself.
The problem of noisy data can be solved with binning methods, regression and clustering.
27. What is RDD (Resilient Distributed Datasets)?
Ans. : RDD is a fundamental data structure of Apache Spark. It is an immutable collection of
objects which computes on the different node of the cluster. Decomposing the name RDD:
Resilient: i.e. fault-tolerant with the help of RDD lineage graph and so able to re compute
missing or damaged partitions due to node failures.
Distributed: Since Data resides on multiple nodes.
Dataset: It represents records of the data you work with. The user can load the data set
externally which can be either JSON file, CSV file, text file or database via JDBC with no
specific data structure.
28. Define Functions of SparkCore?
Ans. : Spark core is nothing but a space engine for distributed data processing and largescale parallel process. The Spark core is also known as the distributed execution engine
while the Python APIs, Java, and Scala offer ETL application platform. Spark core can
perform different functions like monitoring jobs, storage system interactions, job scheduling,
memory management, and fault-tolerance. Further, it can allow workload streaming,
machine learning, and SQL.
The Spark core can also be responsible for:
1. Monitoring, Scheduling, and distributing jobs on a cluster
2. Fault recovery and memory management
3. Ecosystems interactions
29. Component of Spark Ecosystem?
Ans. : The main components of Apache spark are as shown in the figure:
Aapache spark
Spark Core is the underlying general execution engine for spark platform. All the other
functionality is built upon.
It provides in-Memory computing and referencing datasets in external storage systems.
Spark SQL
Spark SQL is a component above Spark Core. It contains a new data abstraction called
SchemaRDD.
SchemaRDD provides support for structured and semi-structured data. It supports many
sources of data including Hive tablets, Parquet, JSON.
Spark Streaming:
Spark Streaming leverages Spark Core's fast scheduling capability to perform streaming
analytics. It ingests data in mini-batches and performs RDD (Resilient Distributed Datasets)
transformations on those mini-batches of data.
30. List the function of SparkSQL
Ans. : API: When writing and executing Spark SQL from Scala, Java, Python or R, a
SparkSession is still the entry point. Once a SparkSession has been established, a
DataFrame or a Dataset needs to be created on the data before Spark SQL can be
executed.
Spark SQL CLI: This Spark SQL Command Line interface is a lifesaver for writing and
testing out SQL. However, the SQL is executed against Hive, so make sure test data exists
in some capacity
0 Comments