Source: Safalta.com
Python Pandas Interview Questions ( 30-35 Questions)
1. What exactly are the Pandas/Python pandas?Pandas is a term used to describe an open-source Python library that offers high-performance data manipulation. Pandas, which means an Econometrics from Multidimensional Data, gets its name from the phrase panel data. It was created in 2008 by Wes McKinney and can be used for data analysis in Python. It is capable of carrying out the five crucial steps—load, manipulate, prepare, model, and analyze—that are necessary for the processing and analysis of data, regardless of where the data came from.
2. What are the various Pandas data structure types?
Series and DataFrames are the two data structures offered by Pandas and supported by the Pandas library. These two data structures are based on the NumPy framework.The DataFrame in pandas is a two-dimensional data structure, whereas a Series is a one-dimensional data structure.
3. How Do Pandas Define Series?
A series is a one-dimensional array that can store different types of data. The term "index" refers to a series' row labels. We can quickly turn the list, tuple, and dictionary into series by employing a "series" technique. A Series cannot have more than one column.
4. What does Pandas' GroupBy mean?
Utilizing the groupby() function in Pandas on actual data sets enables us to reorganise the data. Its main responsibility is to divide the data into several categories. Based on a few criteria, these groups are categorised. Any of the objects' axes can be used to separate them.
5.How Can the Index Be Reset?
When executing the'resetindex' command, the DataFrame's Reset index is used to reset the index. This approach can eliminate one or more layers if the DataFrame has a MultiIndex.
6. Define ReIndexing?
Reindexing is used to change the index of the rows and columns of the DataFrame. We can reindex the single or multiple rows by using the reindex() method. Default values in the new index are assigned NaN if it is not present in the DataFrame.
DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)
7. What is multiple indexing?
Because it deals with data analysis and modification, particularly for working with higher dimensional data, multiple indexing is referred to as a vital indexing. Additionally, it allows us to store and work with data in lower-dimensional data structures like Series and DataFrame that has an arbitrary number of dimensions.
Read More: Artificial intelligence with python
8. Pandas Index: What is it?
A crucial tool that chooses specific rows and columns of data from a DataFrame is called a Pandas Index. Its job is to set up the data for quick access and organisation.
9. Data Aggregation: What Is It?
Applying an aggregation to one or more columns is the primary function of data aggregation. It employs the subsequent:
sum: It is employed to provide the requested axis's total value range.
When an axis is requested, the keyword min is used to deliver the values' minimum.
max: This function returns the axis's maximum values.
10. How is a String converted to a date?
The following code shows how to change the string to a date:
-
fromdatetime import datetime
- # Define dates as the strings
- dmy_str1 = 'Wednesday, July 14, 2018'
- dmy_str2 = '14/7/17'
- dmy_str3 = '14-07-2017'
- # Define dates as the datetime objects
- dmy_dt1 = datetime.strptime(date_str1, '%A, %B %d, %Y')
- dmy_dt2 = datetime.strptime(date_str2, '%m/%d/%y')
- dmy_dt3 = datetime.strptime(date_str3, '%m-%d-%Y')
- #Print the converted dates
- print(dmy_dt1)
- print(dmy_dt2)
- print(dmy_dt3)
11. Explain time periods.
The Time Periods depict the length of time, such as days, years, quarters, or months, among others. It is described as a class that enables the transformation of frequency into periods.
12.How does time offset work?
A list of dates that adhere to the DateOffset are specified by the offset. To advance the dates to valid dates, we can create DateOffsets.
13. What does Pandas' Time Series mean?
Time series data is described as a crucial source of knowledge that offers a strategy employed in a variety of organisations. It includes a lot of information about the time, from a traditional finance business to the education industry.
When machine learning is used to model time series data in order to predict future values, the process is known as time series forecasting.
14. How can we sort the DataFrame?
We can efficiently perform sorting in the DataFrame through different kinds:
- By label
- By Actual value
By label
The DataFrame can be sorted by using the sort_index() method. It can be done by passing the axis arguments and the order of sorting. The sorting is done on row labels in ascending order by default.
By Actual Value
It is another kind through which sorting can be performed in the DataFrame. Like index sorting, sort_values() is a method for sorting the values.
It also provides a feature in which we can specify the column name of the DataFrame with which values are to be sorted. It is done by passing the 'by' argument.
15. How can we create an excel file from a DataFrame?
The to excel() function can be used to export the DataFrame to an excel file.
The destination file name must be specified in order to write a single object to the excel file. Create an ExcelWriter object with the destination filename and the sheet name in the file where you want to write if you want to write to several sheets.
Read More: Python Libraries For Data Science
16. How may a DataFrame be transformed into a NumPy array?
We can convert Pandas DataFrame to numpy arrays for use in some advanced mathematical operations. The DataFrame.to numpy() function is utilised.
The DataFrame that returns the numpy ndarray is subjected to the DataFrame.to numpy() function.
17.What is a NumPy array in Pandas?
The Python library known as Numpy is used to process the elements of multidimensional and one-dimensional arrays as well as execute various numerical computations. Numpy array calculations are more rapid than those using a standard Python array.
18. How can we convert a Series to DataFrame?
The Pandas Series.to_frame() function is used to convert the series object to the DataFrame.
- Series.to_frame(name=None)
name: Refers to the object. Its Default value is None. If it has one value, the passed name will be substituted for the series name.
- s = pd.Series(["a", "b", "c"],
- name="vals")
- s.to_frame()
Output:
vals 0 a 1 b 2 c
19. How can a numpy array be transformed into a dataframe of a certain shape?
As seen in the example below, we can reshape the series p into a dataframe with 6 rows and 2 columns:
-
import pandas as pd
- import numpy as np
- p = pd.Series(np.random.randint(1, 7, 35))
- # Input
- p = pd.Series(np.random.randint(1, 7, 35))
- info = pd.DataFrame(p.values.reshape(7,5))
- print(info)
Output:
0 1 2 3 4 0 3 2 5 5 1 1 3 2 5 5 5 2 1 3 1 2 6 3 1 1 1 2 2 4 3 5 3 3 3 5 2 5 3 6 4 6 3 6 6 6 5
20. How can I calculate the frequency of each unique item in a series?
We may determine the frequency counts for each distinct number p using the example below:
- import pandas as pd
- import numpy as np
- p= pd.Series(np.take(list('pqrstu'), np.random.randint(6, size=17)))
- p = pd.Series(np.take(list('pqrstu'), np.random.randint(6, size=17)))
- p.value_counts()
Output:
s 4 r 4 q 3 p 3 u 3
21 .How can I find a numeric series' minimum, median, 75th percentile, and maximum values?
We can calculate p's minimum, median, 75th percentile, and maximum as shown in the example below:
- import pandas as pd
- import numpy as np
- p = pd.Series(np.random.normal(14, 6, 22))
- state = np.random.RandomState(120)
- p = pd.Series(state.normal(14, 6, 22))
- np.percentile(p, q=[0, 25, 50, 75, 100])
Output:
array([ 4.61498692, 12.15572753, 14.67780756, 17.58054104, 33.24975515])
22. How can I obtain the goods that are exclusive to series A and series B?
Using the example below, we obtain all the elements from p1 and p2 that are unique to neither:
- import pandas as pd
- import numpy as np
- p1 = pd.Series([2, 4, 6, 8, 10])
- p2 = pd.Series([8, 10, 12, 14, 16])
- p1[~p1.isin(p2)]
- p_u = pd.Series(np.union1d(p1, p2)) # union
- p_i = pd.Series(np.intersect1d(p1, p2)) # intersect
- p_u[~p_u.isin(p_i)]
Output:
0 2 1 4 2 6 5 12 6 14 7 16 dtype: int6423. How can I obtain Series A things that aren't in Series B?
Using the isin() technique, we can remove things from p1 that are present in p2.
- import pandas as pd
- p1 = pd.Series([2, 4, 6, 8, 10])
- p2 = pd.Series([8, 10, 12, 14, 16])
- p1[~p1.isin(p2)]
Solution
0 2 1 4 2 6 dtype: int64
24. How is a Pandas DataFrame iterated over?
By utilising a for loop together with a call to iterrows() on the DataFrame, you may loop through the rows of the dataframe.
25.How Do I Rename a Pandas DataFrame's Index or Columns?
The DataFrame's columns or index values can be given new names using the.rename method.
25.How may a column be added to a pandas DataFrame?
Any new column can be added to an existing DataFrame. How to add any new column to an existing DataFrame is shown in the code below:
- # importing the pandas library
- import pandas as pd
- info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
- 'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
- info = pd.DataFrame(info)
- # Add a new column to an existing DataFrame object
- print ("Add new column by passing series")
- info['three']=pd.Series([20,40,60],index=['a','b','c'])
- print (info)
- print ("Add new column using existing DataFrame columns")
- info['four']=info['one']+info['three']
- print (info)
Output:
Add new column by passing series one two three a 1.0 1 20.0 b 2.0 2 40.0 c 3.0 3 60.0 d 4.0 4 NaN e 5.0 5 NaN f NaN 6 NaN Add new column using existing DataFrame columns one two three four a 1.0 1 20.0 21.0 b 2.0 2 40.0 42.0 c 3.0 3 60.0 63.0 d 4.0 4 NaN NaN e 5.0 5 NaN NaN f NaN 6 NaN NaN Know More About: Python Programming Course 26.How does Pandas produce a blank DataFrame?
It uses a two-dimensional array with labelled axes (rows and columns) and is referred to as a DataFrame. A DataFrame is a commonly used data structure in the pandas programming language.
The below code shows how to create an empty DataFrame in Pandas:
- # importing the pandas library
- import pandas as pd
- info = pd.DataFrame()
- print (info)
Output:
Empty DataFrame Columns: [] Index: []27. How can we create a copy of the series in Pandas?
We can create the copy of series by using the following syntax:
pandas.Series.copy
Series.copy(deep=True)
The above statements make a deep copy that includes a copy of the data and the indices. If we set the value of deep to False, it will neither copy the indices nor the data.
27. What does Pandas' categorical data mean?A Categorical data is a Pandas data type that is equivalent to a statistical categorical variable. Typically, a categorical variable is used when there are only a finite number of possible values. Examples include gender, blood type, social status, place of birth, length of observation, and rating using Likert scales. Categorical data values are always either in categories or np.nan.
28. What are the many methods in which a DataFrame may be built in Pandas?
A DataFrame can be created in the following ways:
- Lists
- Dict of ndarrays
Example-1: Create a DataFrame using List:
- import pandas as pd
- # a list of strings
- a = ['Python', 'Pandas']
- # Calling DataFrame constructor on list
- info = pd.DataFrame(a)
- print(info)
Output:
0 0 Python 1 Pandas
Example-2: Create a DataFrame from dict of ndarrays:
- import pandas as pd
- info = {'ID' :[101, 102, 103],'Department' :['B.Sc','B.Tech','M.Tech',]}
- info = pd.DataFrame(info)
- print (info)
Output:
ID Department 0 101 B.Sc 1 102 B.Tech 2 103 M.Tech 29.What is the name of the utility in the Pandas library that makes scatter plot matrices? Scatter_matrix30. What does Pandas' Reindexing mean?
DataFrame can be made compliant with a new index with optional filling logic by reindexing. In the instances where the values are missing from the previous index, it inserts NA/NaN. Unless the new index is provided as being equivalent to the current one, in which case the value of copy becomes False, it returns a new object. It is used to modify the DataFrame's rows and columns' indexes.