drop_duplicates unhashable type: ‘numpy.ndarray’, ‘set’ and ‘list’ in Pandas

TypeError: unhashable type: 'numpy.ndarray'

The error you’re encountering, unhashable type: 'numpy.ndarray', indicates that you’re trying to use drop_duplicates on a column containing NumPy arrays, which are unhashable. You need to convert the arrays into hashable types before using drop_duplicates.

Here’s how you can achieve that:

Python
import pandas as pd

# Sample dataframe
data = {'A': [['1', '2'], ['1', '2'], ['3'], ['4', '5'], ['4', '5']],
        'len': [2, 2, 1, 2, 2]}
df = pd.DataFrame(data)

# Convert arrays to tuples (hashable)
df['A'] = df['A'].apply(tuple)

# Drop duplicates based on the 'A' column
df.drop_duplicates('A', inplace=True)

# Convert tuples back to lists if needed
df['A'] = df['A'].apply(list)

print(df)

This will correctly drop duplicates based on the ‘A’ column, and the resulting dataframe will look like:

Bash
      A      len
0  [1, 2]    2
2  [3]       1
3  [4, 5]    2

By converting the arrays to tuples (which are hashable) before calling drop_duplicates, you avoid the “unhashable type” error.

Leave a Comment

Your email address will not be published. Required fields are marked *

Shopping Cart